Reversal and removing of duplicates in a sentence

Reversal and removing of duplicates in a sentence - c#

I am preparing for a interview question.One of the question is to revert a sentence. Such as "its a awesome day" to "day awesome a its. After this,they asked if there is duplication, can you remove the duplication such as "I am good, Is he good" to "good he is, am I".
for reversal of the sentence i have written following method
public static string reversesentence(string one)
{
StringBuilder builder = new StringBuilder();
string[] split = one.Split(' ');
for (int i = split.Length-1; i >= 0; i--)
{
builder.Append(split[i]);
builder.Append(" ");
}
return builder.ToString();
}
But i am not getting ideas on removing of duplication.Can i get some help here.

This works:
public static string reversesentence(string one)
{
Regex reg = new Regex("\\w+");
bool isFirst = true;
var usedWords = new HashSet<String>(StringComparer.InvariantCultureIgnoreCase);
return String.Join("", one.Split(' ').Reverse().Select((w => {
var trimmedWord = reg.Match(w).Value;
if (trimmedWord != null) {
var wasFirst = isFirst;
isFirst = false;
if (usedWords.Contains(trimmedWord)) //Is it duplicate?
return w.Replace(trimmedWord, ""); //Remove the duplicate phrase but keep punctuation
usedWords.Add(trimmedWord);
if (!wasFirst) //If it's the first word, don't add a leading space
return " " + w;
return w;
}
return null;
})));
}
Basically, we decide if it's distinct based on the word without punctuation. If it already exists, just return the punctuation. If it doesn't exist, print out the whole word including punctuation.
Punctuation also removes the space in your example, which is why we can't just do String.Join(" ", ...) (otherwise the result would be good he Is , am I instead of good he Is, am I
Test:
reversesentence("I am good, Is he good").Dump();
Result:
good he Is, am I

For plain reversal:
String.Join(" ", text.Split(' ').Reverse())
For reversal with duplicate removal:
String.Join(" ", text.Split(' ').Reverse().Distinct())
Both work fine for strings containing just spaces as the separator. When you introduce the , then problem becomes more difficult. So much so that you need to specify how it should be handled. For example, should "I am good, Is he good" become "good he Is am I" or "good he Is , am I"? Your example in the question changes the case of "Is" and groups the "," with it too. That seems wrong to me.

The other answer points to using abstractions but interviewers usually want to see implementation.
For the reversal, the usual trick is to reverse the sentence first and then reverse each word as you travel from left to right. A space will you tell you that you have reached the end of a word. (See Programming Interviews Exposed for a solution to this or just google it. This used to be a VERY popular interview question). Your approach works but is frowned upon because you are using extra space (O(n)).
For removing duplicates, if you're only working with ASCII, you can do the following:
bool[] seenChars = new bool[128];
var sb = new StringBuilder();
foreach(char c in stringOne)
{
if(!seenChars[c]){
seenChars[c] = true;
sb.Append(c);
}
}
return sb.ToString();
The idea is to use the value of the char as an index in the array to tell you whether you've seen this character before or not. With this approach, you will be using O(1) space!
Edit: If you want to de-duplicate words, you probably want to use a HashSet and skip adding it if it already exists.

try this
string sentence = "I am good, Is he good";
var words = sentence.Split(new char[]{' ',','}).Distinct(StringComparer.CurrentCultureIgnoreCase);
var stringBuilder = new StringBuilder();
foreach(var item in words)
{
stringBuilder.Append(item);
stringBuilder.Append(" ");
}
Console.Write(stringBuilder);
Console.ReadLine();

Related

how to find text in a string in c#

I am learning Dotnet c# on my own.
how to find whether a given text exists or not in a string and if exists, how to find count of times the word has got repeated in that string. even if the word is misspelled, how to find it and print that the word is misspelled?
we can do this with collections or linq in c# but here i used string class and used contains method but iam struck after that.
if we can do this with help of linq, how?
because linq works with collections, Right?
you need a list in order to play with linq.
but here we are playing with string(paragraph).
how linq can be used find a word in paragraph?
kindly help.
here is what i have tried so far.
string str = "Education is a ray of light in the darkness. It certainly is a hope for a good life. Eudcation is a basic right of every Human on this Planet. To deny this right is evil. Uneducated youth is the worst thing for Humanity. Above all, the governments of all countries must ensure to spread Education";
for(int i = 0; i < i++)
if (str.Contains("Education") == true)
{
Console.WriteLine("found");
}
else
{
Console.WriteLine("not found");
}

You can make a string a string[] by splitting it by a character/string. Then you can use LINQ:
if(str.Split().Contains("makes"))
{
// note that the default Split without arguments also includes tabs and new-lines
}
If you don't care whether it is a word or just a sub-string, you can use str.Contains("makes") directly.
If you want to compare in a case insensitive way, use the overload of Contains:
if(str.Split().Contains("makes", StringComparer.InvariantCultureIgnoreCase)){}

string str = "money makes many makes things";
var strArray = str.Split(" ");
var count = strArray.Count(x => x == "makes");

the simplest way is to use Split extension to split the string into an array of words.
here is an example :
var words = str.Split(' ');
if(words.Length > 0)
{
foreach(var word in words)
{
if(word.IndexOf("makes", StringComparison.InvariantCultureIgnoreCase) != -1)
{
Console.WriteLine("found");
}
else
{
Console.WriteLine("not found");
}
}
}
Now, since you just want the count of number word occurrences, you can use LINQ to do that in a single line like this :
var totalOccurrences = str.Split(' ').Count(x=> x.IndexOf("makes", StringComparison.InvariantCultureIgnoreCase) != -1);
Note that StringComparison.InvariantCultureIgnoreCase is required if you want a case-insensitive comparison.

Loop iteration of an array string

I'm currently trying to solve a Title Capitalization problem. I have a method that takes in a sentence, splits it into words, compare the words with a check list of words.
Based on this check list, I lowercase the words if they are in the list. Uppercase any words not in the list. The first and last words are always capitalized.
Here is my method:
public string TitleCase(string title)
{
LinkedList<string> wordsList = new LinkedList<string>();
string[] listToCheck = { "a", "the", "to", "in", "with", "and", "but", "or" };
string[] words = title.Split(null);
var last = words.Length - 1;
var firstWord = CapitalizeWord(words[0]);
var lastWord = CapitalizeWord(words[last]);
wordsList.AddFirst(firstWord);
for (var i = 1; i <= last - 1; i++)
{
foreach (var s in listToCheck)
{
if (words[i].Equals(s))
{
wordsList.AddLast(LowercaseWord(words[i]));
}
else
{
wordsList.AddLast(CapitalizeWord(words[i]));
}
}
}
wordsList.AddLast(lastWord);
var sentence = string.Join(" ", wordsList);
return sentence;
}
Running this with the example and expecting the result:
var result = TitleCase("i love solving problems and it is fun");
Assert.AreEqual("I Love Solving Problems and It Is Fun", result);
I get instead:
"I Love Love Love Love Love Love Love Love Solving Solving Solving Solving Solving Solving Solving Solving Problems Problems Problems Problems Problems Problems Problems Problems And And And And And and And And It It It It It It It It Is Is Is Is Is Is Is Is Fun"
If you look closely one and is lowercased. Any tips to how I solve this?

You're doing some extra looping when you go through each of the words to check, and you're not exiting the loop as soon as you find a match (so you're adding the word on each check). To fix this issue in your specific code, you would do something like:
for (var i = 1; i <= last - 1; i++)
{
bool foundMatch = false;
foreach (var s in listToCheck)
{
if (words[i].Equals(s))
{
foundMatch = true;
break;
}
}
if (foundMatch)
{
wordsList.AddLast(LowercaseWord(words[i]));
}
else
{
wordsList.AddLast(CapitalizeWord(words[i]));
}
}
However there is a much easier way, which other answers have provided. But I wanted to point out a couple of other things:
You are creating an unnecessary LinkedList. You already have a list of the words you can manipulate in the words array, so you'll save some memory by just using that.
I think there is a bug in your code (and in some of the answers) where if someone passes in a string with a capital A word in the middle, it will not be converted to lowercase because the Equals method (or in the case of other answers, the Contains method) does a case-sensitive comparison by default. So you might want to pass a case-insensitive comparer to that method.
You don't need to do separate checks for the first and last word. You can just have a single if statement with these checks in the body of your loop
So, here's what I would do:
public static string TitleCase(string title)
{
var listToCheck = new[]{ "a", "the", "to", "in", "with", "and", "but", "or" };
var words = title.Split(null);
// Loop through all words in the array
for (int i = 0; i < words.Length; i++)
{
// If we're on the first or last index, or if
// the word is not in our list, Capitalize it
if (i == 0 || i == (words.Length - 1) ||
!listToCheck.Contains(words[i], StringComparer.OrdinalIgnoreCase))
{
words[i] = CapitalizeWord(words[i]);
}
else
{
words[i] = LowercaseWord(words[i]);
}
}
return string.Join(" ", words);
}

You have a loop within a loop which messes things up, simplify the code to have just one loop:
for (var i = 1; i <= last - 1; i++)
{
// No inner loop
// Use the .Contains() method to see if it's a key word
if (listToCheck.Contains(words[i]))
{
wordsList.AddLast(LowercaseWord(words[i]));
}
else
{
wordsList.AddLast(CapitalizeWord(words[i]));
}
}
Output:
I Love Solving Problems and It Is Fun

The problem is in the foreach loop, you are doing eight checks (the length of the listToCheck array) for each word - and adding the word to the list each time. I'd also recommend using a Linq query, so it should look like this:
for (var i = 1; i <= last - 1; i++) {
if(listToCheck.Contains(words[i]))
wordsList.AddLast(LowercaseWord(words[i]));
else
wordsList.AddLast(CapitalizeWord(words[i]));
}
Also, the reason the sixth 'and' is lowercased is because it is the sixth word in the listToCheck array. On the sixth time around the foreach loop, it succeeds the test and is written in lower case, all the others fail so it is capitalized.

As mentioned in the other answers the loop within the loop doesn't exit.
Just a suggestion, with Linq you could combine checking for the first and last word (through index) and check the ListToCheck together:
public string TitleCase(string title)
{
string[] listToCheck = { "a", "the", "to", "in", "with", "and", "but", "or" };
string[] words = title.Split(null);
var last = words.Length - 1;
return string.Join(" ", words.Select(w=>w.ToLower()).Select(((w,i) => i == 0 || i == last || !listToCheck.Contains(w) ? CapitalizeWord(w) : w)));
}
Note, in this solution the first Select makes sure all words are in lowercase, so the lookup in listToCheck can be done without special comparisons. Because the words are already in lowercase, that doesn't have to be done any more if the word doesn't have to be capitalized.

C# implementation of Dictionary to count occurrences of words returns duplicate words in output

I recently made a little application to read in a text file of lyrics, then use a Dictionary to calculate how many times each word occurs. However, for some reason I'm finding instances in the output where the same word occurs multiple times with a tally of 1, instead of being added onto the original tally of the word. The code I'm using is as follows:
StreamReader input = new StreamReader(path);
String[] contents = input.ReadToEnd()
.ToLower()
.Replace(",","")
.Replace("(","")
.Replace(")", "")
.Replace(".","")
.Split(' ');
input.Close();
var dict = new Dictionary<string, int>();
foreach (String word in contents)
{
if (dict.ContainsKey(word))
{
dict[word]++;
}else{
dict[word] = 1;
}
}
var ordered = from k in dict.Keys
orderby dict[k] descending
select k;
using (StreamWriter output = new StreamWriter("output.txt"))
{
foreach (String k in ordered)
{
output.WriteLine(String.Format("{0}: {1}", k, dict[k]));
}
output.Close();
timer.Stop();
}
The text file I'm inputting is here: http://pastebin.com/xZBHkjGt (it's the lyrics of the top 15 rap songs, if you're curious)
The output can be found here: http://pastebin.com/DftANNkE
A quick ctrl-F shows that "girl" occurs at least 13 different times in the output. As far as I can tell, it is the exact same word, unless there's some sort of difference in ASCII values. Yes, there are some instances on there with odd characters in place of a apostrophe, but I'll worry about those later. My priority is figuring out why the exact same word is being counted 13 different times as different words. Why is this happening, and how do I fix it? Any help is much appreciated!

Another way is to split on non words.
var lyrics = "I fly with the stars in the skies I am no longer tryin' to survive I believe that life is a prize But to live doesn't mean your alive Don't worry bout me and who I fire I get what I desire, It's my empire And yes I call the shots".ToLower();
var contents = Regex.Split(lyrics, #"[^\w'+]");
Also here's an alternative (and probably more obscure) loop
int value;
foreach (var word in contents)
{
dict[word] = dict.TryGetValue(word, out value) ? ++value : 1;
}
dict.Remove("");

If you notice, the repeat occurrences appear on a line following a word which apparently doesn't have a count.
You're not stripping out newlines, so em\r\ngirl is being treated as a different word.

String[] contents = input.ReadToEnd()
.ToLower()
.Replace(",", "")
.Replace("(", "")
.Replace(")", "")
.Replace(".", "")
.Split("\r\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Works better.

Add Trim to each word:
foreach (String word in contents.Select(w => w.Trim()))

How to capitalize only the first letter of a string, while lowercasing the rest?

My string is:
1 STATE OF GOA THROUGH CHIEF
I want the output to be like
1 State of goa through chief
How can I keep the first letter capital and convert other to small? I had used .ToLower(), but it converts all the letters to small.

string s = "1 STATE OF GOA THROUGH CHIEF";
bool sawLetter = false;
StringBuilder sb = new StringBuilder(s.Length);
foreach (char c in s) {
if (!sawLetter && Char.IsLetter(c)) {
sb.Append(Char.ToUpperInvariant(c));
sawLetter = true;
}
else {
sb.Append(Char.ToLowerInvariant(c));
}
}
Console.WriteLine(sb.ToString());
You could get super fancy and write this as an aggregate query in LINQ but that would be a case of fancy coding syndrome. Just make this as an extension method and move on.
Note that this is at least an order of magnitude more maintainable than using Substring to split the string into two pieces.

2Try this:
petitioner = respetMyReader["pet_name"].ToString();
petitioner = petitioner.Substring(2,1).ToUpper() + petitioner.Substring(1).ToLower();

Try putting this just before setting the value of HiddenValue4 to Fil_No...
String FinalString = fil_no.substring(1, 3) + LCase(fil_no.substring(4, (fil_no.Length - 4)));
This should keep the first three characters as whatever fil_no is and make all the others lowercase.

How to capitalize the first character of each word, or the first character of a whole string, with C#?

I could write my own algorithm to do it, but I feel there should be the equivalent to ruby's humanize in C#.
I googled it but only found ways to humanize dates.
Examples:
A way to turn "Lorem Lipsum Et" into "Lorem lipsum et"
A way to turn "Lorem lipsum et" into "Lorem Lipsum Et"

As discussed in the comments of #miguel's answer, you can use TextInfo.ToTitleCase which has been available since .NET 1.1. Here is some code corresponding to your example:
string lipsum1 = "Lorem lipsum et";
// Creates a TextInfo based on the "en-US" culture.
TextInfo textInfo = new CultureInfo("en-US",false).TextInfo;
// Changes a string to titlecase.
Console.WriteLine("\"{0}\" to titlecase: {1}",
lipsum1,
textInfo.ToTitleCase( lipsum1 ));
// Will output: "Lorem lipsum et" to titlecase: Lorem Lipsum Et
It will ignore casing things that are all caps such as "LOREM LIPSUM ET" because it is taking care of cases if acronyms are in text so that "IEEE" (Institute of Electrical and Electronics Engineers) won't become "ieee" or "Ieee".
However if you only want to capitalize the first character you can do the solution that is over here… or you could just split the string and capitalize the first one in the list:
string lipsum2 = "Lorem Lipsum Et";
string lipsum2lower = textInfo.ToLower(lipsum2);
string[] lipsum2split = lipsum2lower.Split(' ');
bool first = true;
foreach (string s in lipsum2split)
{
if (first)
{
Console.Write("{0} ", textInfo.ToTitleCase(s));
first = false;
}
else
{
Console.Write("{0} ", s);
}
}
// Will output: Lorem lipsum et

There is another elegant solution :
Define the function ToTitleCase in an static class of your projet
using System.Globalization;
public static string ToTitleCase(this string title)
{
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(title.ToLower());
}
And then use it like a string extension anywhere on your project:
"have a good day !".ToTitleCase() // "Have A Good Day !"

Use regular expressions for this looks much cleaner:
string s = "the quick brown fox jumps over the lazy dog";
s = Regex.Replace(s, #"(^\w)|(\s\w)", m => m.Value.ToUpper());

All the examples seem to make the other characters lowered first which isn't what I needed.
customerName = CustomerName <-- Which is what I wanted
this is an example = This Is An Example
public static string ToUpperEveryWord(this string s)
{
// Check for empty string.
if (string.IsNullOrEmpty(s))
{
return string.Empty;
}
var words = s.Split(' ');
var t = "";
foreach (var word in words)
{
t += char.ToUpper(word[0]) + word.Substring(1) + ' ';
}
return t.Trim();
}

If you just want to capitalize the first character, just stick this in a utility method of your own:
return string.IsNullOrEmpty(str)
? str
: str[0].ToUpperInvariant() + str.Substring(1).ToLowerInvariant();
There's also a library method to capitalize the first character of every word:
http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase.aspx

CSS technique is ok but only changes the presentation of the string in the browser. A better method is to make the text itself capitalised before sending to browser.
Most of the above implimentations are ok, but none of them address the issue of what happens if you have mixed case words that need to be preserved, or if you want to use true Title Case, for example:
"Where to Study PHd Courses in the USA"
or
"IRS Form UB40a"
Also using CultureInfo.CurrentCulture.TextInfo.ToTitleCase(string) preserves upper case words as in
"sports and MLB baseball" which becomes "Sports And MLB Baseball" but if the whole string is put in upper case, then this causes an issue.
So I put together a simple function that allows you to keep the capital and mixed case words and make small words lower case (if they are not at the start and end of the phrase) by including them in a specialCases and lowerCases string arrays:
public static string TitleCase(string value) {
string titleString = ""; // destination string, this will be returned by function
if (!String.IsNullOrEmpty(value)) {
string[] lowerCases = new string[12] { "of", "the", "in", "a", "an", "to", "and", "at", "from", "by", "on", "or"}; // list of lower case words that should only be capitalised at start and end of title
string[] specialCases = new string[7] { "UK", "USA", "IRS", "UCLA", "PHd", "UB40a", "MSc" }; // list of words that need capitalisation preserved at any point in title
string[] words = value.ToLower().Split(' ');
bool wordAdded = false; // flag to confirm whether this word appears in special case list
int counter = 1;
foreach (string s in words) {
// check if word appears in lower case list
foreach (string lcWord in lowerCases) {
if (s.ToLower() == lcWord) {
// if lower case word is the first or last word of the title then it still needs capital so skip this bit.
if (counter == 0 || counter == words.Length) { break; };
titleString += lcWord;
wordAdded = true;
break;
}
}
// check if word appears in special case list
foreach (string scWord in specialCases) {
if (s.ToUpper() == scWord.ToUpper()) {
titleString += scWord;
wordAdded = true;
break;
}
}
if (!wordAdded) { // word does not appear in special cases or lower cases, so capitalise first letter and add to destination string
titleString += char.ToUpper(s[0]) + s.Substring(1).ToLower();
}
wordAdded = false;
if (counter < words.Length) {
titleString += " "; //dont forget to add spaces back in again!
}
counter++;
}
}
return titleString;
}
This is just a quick and simple method - and can probably be improved a bit if you want to spend more time on it.
if you want to keep the capitalisation of smaller words like "a" and "of" then just remove them from the special cases string array. Different organisations have different rules on capitalisation.
You can see an example of this code in action on this site: Egg Donation London - this site automatically creates breadcrumb trails at the top of the pages by parsing the url eg "/services/uk-egg-bank/introduction" - then each folder name in the trail has hyphens replaced with spaces and capitalises the folder name, so uk-egg-bank becomes UK Egg Bank. (preserving the upper case 'UK')
An extension of this code could be to have a lookup table of acronyms and uppercase/lowercase words in a shared text file, database table or web service so that the list of mixed case words can be maintained from one single place and apply to many different applications that rely on the function.

There is no prebuilt solution for proper linguistic captialization in .NET. What kind of capitialization are you going for? Are you following the Chicago Manual of Style conventions? AMA or MLA? Even plain english sentence capitalization has 1000's of special exceptions for words. I can't speak to what ruby's humanize does, but I imagine it likely doesn't follow linguistic rules of capitalization and instead does something much simpler.
Internally, we encountered this same issue and had to write a fairly large amount code just to handle proper (in our little world) casing of article titles, not even accounting for sentence capitalization. And it indeed does get "fuzzy" :)
It really depends on what you need - why are you trying to convert the sentences to proper capitalization (and in what context)?

I have achieved the same using custom extension methods. For First Letter of First sub-string use the method yourString.ToFirstLetterUpper(). For First Letter of Every sub-string excluding articles and some propositions, use the method yourString.ToAllFirstLetterInUpper(). Below is a console program:
class Program
{
static void Main(string[] args)
{
Console.WriteLine("this is my string".ToAllFirstLetterInUpper());
Console.WriteLine("uniVersity of lonDon".ToAllFirstLetterInUpper());
}
}
public static class StringExtension
{
public static string ToAllFirstLetterInUpper(this string str)
{
var array = str.Split(" ");
for (int i = 0; i < array.Length; i++)
{
if (array[i] == "" || array[i] == " " || listOfArticles_Prepositions().Contains(array[i])) continue;
array[i] = array[i].ToFirstLetterUpper();
}
return string.Join(" ", array);
}
private static string ToFirstLetterUpper(this string str)
{
return str?.First().ToString().ToUpper() + str?.Substring(1).ToLower();
}
private static string[] listOfArticles_Prepositions()
{
return new[]
{
"in","on","to","of","and","or","for","a","an","is"
};
}
}
OUTPUT
This is My String
University of London
Process finished with exit code 0.

Far as I know, there's not a way to do that without writing (or cribbing) code. C# nets (ha!) you upper, lower and title (what you have) cases:
http://support.microsoft.com/kb/312890/EN-US/

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reversal and removing of duplicates in a sentence - c#

Related

how to find text in a string in c#

Loop iteration of an array string

C# implementation of Dictionary to count occurrences of words returns duplicate words in output

How to capitalize only the first letter of a string, while lowercasing the rest?

How to capitalize the first character of each word, or the first character of a whole string, with C#?

Categories

Resources