Compare and extract common words between 2 strings - c#

In ASP.NET C# and assuming I have a string contains a comma separated words:
string strOne = "word,WordTwo,another word, a third long word, and so on";
How to split then compare with another paragraph that might and might not contain these words:
string strTwo = " when search a word or try another word you may find that WordTwo is there with others";
Then how to output these common words departed with commas in a third string
string strThree = "output1, output2, output3";
To get a result like : "word, WordTwo, another word,"

You will need to split strOne by comma, and use a contains against strTwo.
Note: You can't split strTwo by space and use intersect because your items may have spaces. i.e. "another word"
string strOne = "word,WordTwo,another word, a third long word, and so on";
string strTwo = " when search a word or try another word you may find that WordTwo is there with others";
var tokensOne = strOne.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
var list = tokensOne.Where(x => strTwo.Contains(x));
var result = string.Join(", ",list);

You could do something like this:
string strOne = "word,WordTwo,another word, a third long word, and so on";
string strTwo = " when search a word or try another word you may find that WordTwo is there with others";
string finalString = string.Empty;
foreach (var line in strOne.Split(","))
{
if(strTwo.Contains(line))
finalString += (line + ",");
}
finalString = finalString.Substring(0, finalString.Length - 1);
Console.WriteLine(finalString);

Related

How can I Replace special characters

I've got a string value with a lot of different characters
I want to:
replace TAB,ENTER, with Space
replace Arabic ي with Persian ی
replace Arabic ك with Persian ک
remove newlines from both sides of a string
replace multiple space with one space
Trim space
The following Function is for cleaning data. and it works correctly.
Does anyone have any idea for better performance and less code for maintenance :)
static void Main(string[] args)
{
var output = "كgeeks 01$سهيلاطريقي03. اشك!#!!.ي";
//output = output.Replace("\u064A", "\u0649");//ي
output = output.Replace("\u064A", "\u06CC");//replace arabic ي with persian ی
output = output.Replace("\u0643", "\u06A9");//replace arabic ك with persian ک
output = output.Trim('\r', '\n');//remove newlines from both sides of a string
output = output.Replace("\n", "").Replace("\r", " ");//replace newline with space
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);//replace multiple space with one space
output = regex.Replace(output, " ");
char tab = '\u0009';
output = output.Replace(tab.ToString(), "");
Console.WriteLine(output);
}
You can refactor using two lists: one for the trim process and one for the replace process.
var itemsTrimChars = new List<char>()
{
'\r',
'\n'
};
var itemsReplaceStrings = new Dictionary<string, string>()
{
{ "\n", "" },
{ "\r", " " },
{ "\u064A", "\u06CC" },
{ "\u0643", "\u06A9" },
{ "\u0009", "" }
}.ToList();
Thus they are maintenable tables with the technology you want: as local in this example, declared at the level of a class, using tables in a database, using disk text files...
Used like that:
itemsTrimChars.ForEach(c => output = output.Trim(c));
itemsReplaceStrings.ForEach(p => output = output.Replace(p.Key, p.Value));
For the regex to replace double spaces, I know nothing about, but if you need to replace other doubled, you can create a third list.
You can do this by iterating over each character and apply those rules, forming a new output string that is the format you want. It should be faster than all those string.Replace, and Regex.Match.
Use string builder for performance when appending, don't use string += string
First Find Character in your string and then remove it and in the same index add new character
private string ReplaceChars(string Source, string Find, string Replace)
{
int Place = Source.IndexOf(Find);
string result = Source.Remove(Place, Find.Length).Insert(Place, Replace);
return result;
}
Usage :
text= "كgeeks 01$سهيلاطريقي03. اشك!#!!.ي";
var result =ReplaceChars(text,"ي","ی");

Complex string split C#

I have input file like this:
input.txt
aa#aa.com bb#bb.com "Information" "Hi there"
cc#cc.com dd#dd.com "Follow up" "Interview"
I have used this method:
string[] words = item.Split(' ');
However, it splits every words with space. I also have spaces in quotes strings but I won't split those spaces.
Basically I want to parse this input from file to this output:
From = aa#aa.com
To = bb#bb.com
Subject = Information
Body = Hi there
How do I split these strings in C#?
Simply you can use Regex as it is said in this question
var stringValue = "aa#aa.com bb#bb.com \"Information\" \"Hi there\"";
var parts = Regex.Matches(stringValue, #"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
//parts: aa#aa.com
bb#bb.com
"Information"
"Hi there"
Also you may try Replace function to remove those " characters.
The String.Split() method has an overload that allows you to specify the number of splits required. You can get what you want like this:
Read one line at a time
Call input.Split(new string[" "], 3, StringSplitOptions.None) - this returns an array of strings with 3 parts. Since email addresses don't have spaces in them, the first two strings will be the from/to addresses, and the third string will be the subject and message. Assume the result of this call is stored in firstSplit[], then firstSplit[0] is the from address, firstSplit[1] is the to address, and firstSplit[2] is the subject and message combined.
Call firstSplit[2].Split(new string[""" """], 2, StringSplitOptions.None) - this searches for the string " " in the concatenated subject+message from the previous call, which should pinpoint the separator between the end of the subject and the start of the message. This will give you the subject and message in another array. (The double-quotes inside are doubled to escape them)
This assumes you disallow double quotes in your subject and message. If you do allow double quotes, then you need to ensure you escape them before putting it in the file in the first place.
You can do this without using regex by just using IndexOf and SubString just put it in a loop if you have multiple emails to parse.
It's not pretty but it would be faster than RegEx if you're doing a lot of them.
string content = #"abba#aa.com dddb#bdd.com ""Information"" ""Hi there""";
string firstEmail = content.Substring(0, content.IndexOf(" ", StringComparison.Ordinal));
string secondEmail = content.Substring(firstEmail.Length, content.IndexOf(" ", firstEmail.Length + 1) - firstEmail.Length);
int firstQuote = content.IndexOf("\"", StringComparison.Ordinal);
string subjectandMessage = content.Substring(firstQuote, content.Length - content.IndexOf("\"", firstQuote, StringComparison.Ordinal));
String[] words = subjectandMessage.Split(new string[] { "\" \"" }, StringSplitOptions.None);
Console.WriteLine(firstEmail);
Console.WriteLine(secondEmail);
Console.WriteLine(words[0].Remove(0,1));
Console.WriteLine(words[1].Remove(words[1].Length -1));
Output:
aa#aa.com
bb#bb.com
Information
Hi there
As Spencer pointed out, read this file line by line using File.ReadAllLines() method and then apply String.Split[] method with spaces using something like this:
string[] elements = string.Split(new char[0]);
UPDATE
Not a pretty solution, but this is how I think it can work:
string[] readText = File.ReadAllLines(' ');
//Take value of first 3 fields by simple readText[index]; (index: 0-2)
string temp = "";
for(int i=3; i<readText.Length; i++)
{
temp += readText[i];
}
Requires reference to Microsoft.VisualBasic, but a bit more reliable than Regex:
using (var tfp = new Microsoft.VisualBasic.FileIO.TextFieldParser("input.txt")) {
for (tfp.SetDelimiters(" "); !tfp.EndOfData;) {
string[] fields = tfp.ReadFields();
Debug.Print(string.Join(",", fields)); // "aa#aa.com,bb#bb.com,Information,Hi there"
}
}

Want to Remove all the Characters Except First character in the word

I am new to c#, i need to trim a sentence which has many words. I need only first characters in all the words. For example
If a sentence is like this.
input : Bharat Electrical Limited => output : BEL
how do i accomplish this in c#?
Thanks in advance
Try
string sentence = "Bharat Electrical Limited";
var result = sentence.Split(' ').Aggregate("", (current, word) => current + word.Substring(0, 1));
EDIT: Here's a brief explanantion:
sentence.Split(' ') splits the string into elements based on space (' ')
.Aggregate("", (current, word) => current + word.Substring(0, 1)); is a linq expression to iterate through every word retrieve above perform an operation on it and
word.Substring(0, 1) returns the first letter of every word
This is the sort of thing that's easily accomplished with a regular expression:
s = Regex.Replace(s, #"(\S)\S*\s*", "$1");
This effectively matches consecutive non-white space characters, followed by white space, and replaces the whole sequence by its first character.
You can do something like this -
string sentence = "Bharat Electrical Limited";
//Split the words
var letters = sentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
//Take firsst letter of every word
var myAbbWord = letters.Aggregate(string.Empty, (current, letter) => current + letter.First());
myAbbWord should display BEL for you.
Here is the solution.
I hope it helps.
string str1 = "Bharat Electrical Limited";
var resultList = str1.Split(' ');
string result = resultList.Aggregate(String.Empty, (current, word) => current + word.First());
First thing you want to Split the string into words, then take First letter from each word. You can do this by a simple for loop like the following:
string inputStr = "Bharat Electrical Limited";
List<char> firstChars = new List<char>();
foreach (string word in inputStr.Split(new char[]{' '},StringSplitOptions.RemoveEmptyEntries))
{
firstChars.Add(word[0]); // Collecting first chars of each word
}
string outputStr = String.Join("", firstChars);
And this will be the Short way for this:
string inputStr = "Bharat Electrical Limited";
string shortWord = String.Join("", inputStr.Split(new char[]{' '},StringSplitOptions.RemoveEmptyEntries).Select(x => x[0]));
If the first character in each string is not Caps, then you can use any of the following options.
Make the input into Title cased sentence, before performing the action.
For this you can use the following code:
inputStr = System.Threading.Thread.CurrentThread.CurrentCulture.TextInfo.ToTitleCase(inputStr.ToLower());
Convert the Character to uppercase while we collect Characters from the word,
This can be achieved by:
firstChars.Add(char.ToUpper(word[0])); // For the first case
.Select(x => char.ToUpper(x[0])) // For the second case
Here you can find a working example for all above mentioned cases
Simplest way is :
string inputStr = "Bharat Electrical Limited";
string result = new String(inputStr.Split(' ').Select(word => (word[0])).ToArray());
// BEL
You need to add using System.Linq; to your source file.
Logic is:
Split the string into array or words (delimited by space), then project this array by selecting the first char of each string. The result is an array of the first characters. Then using the String overload constructor taking char array, construct the result string.
This might looks more friendly to you
string intput = "Bharat Electrical Limited";
string output = string.Join( "",intput.Split(new string[] {" "}, StringSplitOptions.RemoveEmptyEntries)
.Select(a => a.First()));
First split your input sentense with space and then use First() extension on string to get first character of string
Use this method
string inputStr = "Bharat Electrical Limited";
var arrayString = string.Join("", inputStr.Split(' ').Select(x => x[0]));

Make words from string matching with string bold

I want to make words matching in the string bold. I am using Jquery autocomplete with asp.net mvc. My following code works only if string has single word.
label = p.Name.Replace(termToSearch.ToLower(),"<b>" + termToSearch.ToLower() + "</b>"),
But doesnt work when I have 2 words matching which are at random position.
E.g When I search Gemini Oil
My Result should be id Gemini Sunflower Oil.
Any Ideas
A single line of Regex can do just that:
String term = "Gemini Oil";
String input = "Gemini Sunflower Oil.";
String result = Regex.Replace( input, String.Join("|", term.Split(' ')), #"<b>$&</b>");
Console.Out.WriteLine(result);
<b>Gemini</b> Sunflower <b>Oil</b>.
You could just split the search term on each space character and then run the replace multiple times:
var terms = termToSearch.split(' ');
foreach (var term in terms) {
p = p.Name.Replace(term.ToLower(),"<b>" + term.ToLower() + "</b>"),
}
label = p;

How to indicate whitespaces while reading from a .txt file

I have a simple .txt file with X,Y-values in it. It is structured like this:
-25.7754 35.87
-22.1233 32.16
-20.361 30.75
etc.
I am able to read single lines or the whole text to the end, with objstream.ReadToEnd(); & objstream.ReadLine().
But here's my question how could I indicate when the String after the first value ends so I can save/parse it to float & proceed reading the value of the next string?
Here is the read functionality I have so far :)
StreamReader objStream = new StreamReader("C:blablabla\\Text.asc");
textBox1.Text = objStream.ReadLine();
Thanks in advance,
BC++
Use String.split()
As requested, an example :
string s = "there is a cat";
//
// Split string on spaces.
// ... This will separate all the words.
//
string[] words = s.Split(' ');
foreach (string word in words)
{
Console.WriteLine(word);
}
The output is :
there
is
a
cat
Look at the string.Split methods:
var line1 = objStream.ReadLine();
var lineParts = line1.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
textBox1.Text = lineParts[0];
textBox2.Text = lineParts[1];
Note the use of an overload that uses StringSplitOptions.RemoveEmptyEntries - the means that if you have multiple spaces in succession, the result will not contain empty entries.
If you really mean white-space and not space then you have to go this way:
string line = "-25.7754 35.87";
string[] values = line.Split(new char[] { }, StringSplitOptions.RemoveEmptyEntries);
The difference from the other answers in the splitting character. If this not defined then white-space characters are assumed to be the delimiters. In other words you will get the same result for
string line = "-25.7754\t35.87"; // tab instead of spaces.
You will have the flexibility to split correctly fixed length or tab delimited lines using the same code.

Categories