C# - Find and remove substring from a list of strings - c#

I have a bad word list. If a string contains any of item/items from the bad word list, I need to remove that bad word from the string.
List<string> badWordList = new List<string> { "email:", "index", "mobile:", "fax:", "web" };
I am able to search the string but not able to remove. Please help me out...I tried below code:
string myText = "email:abc#gmail.com";
if (badWordList.Any(w => myText.IndexOf(w, StringComparison.OrdinalIgnoreCase) >= 0))
{
// How to remove
}
Below is the set of input of expected output:
i/p- email:abc#gmail.com
o/p - abc#gmail.com
i/p- Jack F. Mobile:89788987
o/p- Jack F. 89788987
i/p- Jack F. Email:t#p.c mobile:65777 WEB
o/p- Jack F. t#p.c 65777
I would prefer a non-regex approach. Thanks for your help.

You can iterate on the bad words and remove them:
foreach (string badWord in badWordList) {
myText = myText.Replace(badWord, string.Empty);
}
If you need a case-insensitive solution you can use the overload with a StringComparison parameter:
myText = myText.Replace(badWord, string.Empty, StringComparison.OrdinalIgnoreCase);
Note: a previous version of this answer, which is quite old, and some of the comments to this question, suggested to use a Regex for a case-insensitive string replacement (see example below).
I think that now that there is the overload with the StringComparison it's much better to just use that.
string myText = "EMAIL:abc#gmail.com";
Regex badWords = new Regex("email:|index|mobile:|fax:|web", RegexOptions.IgnoreCase | RegexOptions.Compiled);
myText = badWords.Replace(myText, string.Empty);

You can remove strings by replacing them with the empty string:
foreach (var badWord in badWordList)
{
myText = myText.Replace(badWord, "");
}
Unfortulately this is case sensitive. For case-insensitive string replace without regular expressions, see Is there a case insensitive string replace in .Net without using Regex?
You can also do it with a regular expression, in which case case-insensitive comparison comes "for free":
var regex = String.Join("|", badWordList.Select(w => Regex.Escape(w)));
var myText = Regex.replace(myText, regex, "", RegexOptions.IgnoreCase);

Replace the instance of the 'bad word' with string.Empty: -
List<string> badWordList = new List<string> { "email", "index:", "mobile:", "fax:", "web" };
string myText = "email:abc#gmail.com";
foreach (string s in badWordList)
{
myText = myText.Replace(s,string.Empty);
}

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
namespace WindowMaker
{
class Program
{
static void Main(string[] args)
{
System.Console.WriteLine("Enter Main string...");
String str = System.Console.ReadLine();
System.Console.WriteLine("Enter sub string...");
String sub = System.Console.ReadLine();
Boolean flag;
int strlen=sub.Length;
int inde = str.IndexOf(sub);
while (inde != -1)
{
inde = str.IndexOf(sub);
str=str.Replace(sub,"");
}
System.Console.WriteLine("Remaining string :: {0}",str);
Console.Read();
}
}
}

if it is case sensitive :
List<String> badWordList = new List<String> { "email:", "index", "mobile:", "fax:", "web" };
String myText = "This is a exemple of Email: deleting with index and fax and web";
badWordList.ForEach(bw => myText = myText.Replace(bw, String.Empty));

Related

How to Find List Item that is Present in a String

I have to find whether the String Contains one of the Exact word which are present in the List.
Eg:
List<string> KeyWords = new List<string>(){"Test","Re Test","ACK"};
String s1 = "Please give the Test"
String s2 = "Please give Re Test"
String s3 = "Acknowledge my work"
Now,
When I use: Keywords.Where(x=>x.Contains(s1)) It Gives me a Match which is correct. But for s3 it should not.
Any workaround for this.
Use split function on the basis of space and match the words.
i hope that will worked.
How about using regular expressions?
public static class Program
{
public static void Main(string[] args)
{
var keywords = new List<string>() { "Test", "Re Test", "ACK" };
var targets = new[] {
"Please give the Test",
"Please give Re Test",
"Acknowledge my work"
};
foreach (var target in targets)
{
Console.WriteLine($"{target}: {AnyMatches(target, keywords)}");
}
Console.ReadKey();
}
private static bool AnyMatches(string target, IEnumerable<string> keywords)
{
foreach (var keyword in keywords)
{
var regex = new Regex($"\\b{Regex.Escape(keyword)}\\b", RegexOptions.IgnoreCase);
if (regex.IsMatch(target))
return true;
}
return false;
}
}
Creating the regular expression always on-the-fly is maybe not the best option in production, so you should think of creating a list of Regex based on your keywords instead of storing only the keywords in a dumb string list.
Bit different solution.
void Main()
{
var KeyWords = new List<string>(){ "Test","Re Test","ACK" };
var array = new string[] {
"Please give the Test",
"Please give Re Test",
"Acknowledge my work"
};
foreach(var c in array)
{
Contains(c,KeyWords); // Your result.
}
}
private bool Contains(string sentence, List<string> keywords) {
var result = keywords.Select(keyWord=>{
var parts3 = Regex.Split(sentence, keyWord, RegexOptions.IgnoreCase).Where(x=>!string.IsNullOrWhiteSpace(x)).First().Split((char[])null); // Split by the keywords and get the rest of the words splitted by empty space
var splitted = sentence.Split((char[])null); // split the original string.
return parts3.Where(t=>!string.IsNullOrWhiteSpace(t)).All(x=>splitted.Any(t=>t.Trim().Equals(x.Trim(),StringComparison.InvariantCultureIgnoreCase)));
}); // Check if all remaining words from parts3 are inside the existing splitted string, thus verifying if full words.
return result.All(x=>x);// if everything matches then it was a match on full word.
}
The Idea is to split by the word you are looking for e.g Split by ACK and then see if the remaining words are matched by words splitted inside the original string, if the remaining match that means there was a word match and thus a true. If it is a part split meaning a sub string was taken out, then words wont match and thus result will be false.
Your usage of Contains is backwards:
var foundKW = KeyWords.Where(kw => s1.Contains(kw)).ToList();
how about the using of regex
using \bthe\b, \b represents a word boundary delimiter.
List<string> KeyWords = new List<string>(){"Test","Re Test","ACK"};
String s1 = "Please give the Test"
String s2 = "Please give Re Test"
String s3 = "Acknowledge my work"
bool result = false ;
foreach(string str in KeyWords)
{
result = Regex.IsMatch(s1 , #"\b"+str +"\b");
if(result)
break;
}

How can I extract a dynamic length string from multiline string?

I am using "nslookup" to get machine name from IP.
nslookup 1.2.3.4
Output is multiline and machine name's length dynamic chars. How can I extract "DynamicLengthString" from all output. All suggestions IndexOf and Split, but when I try to do like that, I was not a good solution for me. Any advice ?
Server: volvo.toyota.opel.tata
Address: 5.6.7.8
Name: DynamicLengthString.toyota.opel.tata
Address: 1.2.3.4
I made it the goold old c# way without regex.
string input = #"Server: volvo.toyota.opel.tata
Address: 5.6.7.8
Name: DynamicLengtdfdfhString.toyota.opel.tata
Address: 1.2.3.4";
string targetLineStart = "Name:";
string[] allLines = input.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
string targetLine = String.Empty;
foreach (string line in allLines)
if (line.StartsWith(targetLineStart))
{
targetLine = line;
}
System.Console.WriteLine(targetLine);
string dynamicLengthString = targetLine.Remove(0, targetLineStart.Length).Split('.')[0].Trim();
System.Console.WriteLine("<<" + dynamicLengthString + ">>");
System.Console.ReadKey();
This extracts "DynamicLengtdfdfhString" from the given input, no matter where the Name-Line is and no matter what comes afterwards.
This is the console version to test & verify it.
You can use Regex
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string Content = "Server: volvo.toyota.opel.tata \rAddress: 5.6.7.8 \rName: DynamicLengthString.toyota.opel.tata \rAddress: 1.2.3.4";
string Pattern = "(?<=DynamicLengthString)(?s)(.*$)";
//string Pattern = #"/^Dy*$/";
MatchCollection matchList = Regex.Matches(Content, Pattern);
Console.WriteLine("Running");
foreach(Match match in matchList)
{
Console.WriteLine(match.Value);
}
}
}
I'm going to assume your output is exactly like you put it.
string output = ExactlyAsInTheQuestion();
var fourthLine = output.Split(Environment.NewLine)[3];
var nameValue = fourthLine.Substring(9); //skips over "Name: "
var firstPartBeforePeriod = nameValue.Split('.')[0];
//firstPartBeforePeriod should equal "DynamicLengthString"
Note that this is a barebones example:
Either check all array indexes before you access them, or be prepared to catch IndexOutOfRangeExceptions.
I've assumed that the four spaces between "Name:" and "DynamicLengthString" are four spaces. If they are a tab character, you'll need to adjust the Substring(9) method to Substring(6).
If "DynamicLengthString" is supposed to also have periods in its value, then my answer does not apply. You'll need to use a regex in that case.
Note: I'm aware that you dismissed Split:
All suggestions IndexOf and Split, but when I try to do like that, I was not a good solution for me.
But based on only this description, it's impossible to know if the issue was in getting Split to work, or it actually being unusable for your situation.

C# Regex to replace specific hashtags with certain block of text

I am a new C# developer and I am struggling right now to write a method to replace a few specific hashtags in a sample of tweets with certain block of texts. For example if the tweet has a hashtag like #StPaulSchool, I want to replace this hashtag with this certain text "St. Paul School" without the '#' tag.
I have a very small list of the certain words which I need to replace. If there is no match, then I would like remove the hashtag (replace it with empty string)
I am using the following method to parse the tweet and convert it into a formatted tweet but I don't know how to enhance it in order to handle the specific hashtags. Could you please tell me how to do that?
Here's the code:
public string ParseTweet(string rawTweet)
{
Regex link = new Regex(#"http(s)?://([\w+?\.\w+])+([a-zA-Z0-9\~\!\#\#\$\%\^\&\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?");
Regex screenName = new Regex(#"#\w+");
Regex hashTag = new Regex(#"#\w+");
var words_to_replace = new string[] { "StPaulSchool", "AzharSchool", "WarwiSchool", "ManMet_School", "BrumSchool"};
var inputWords = new string[] { "St. Paul School", "Azhar School", "Warwick School", "Man Metapolian School", "Brummie School"};
string formattedTweet = link.Replace(rawTweet, delegate (Match m)
{
string val = m.Value;
//return string.Format("URL");
return string.Empty;
});
formattedTweet = screenName.Replace(formattedTweet, delegate (Match m)
{
string val = m.Value.Trim('#');
//return string.Format("USERNAME");
return string.Empty;
});
formattedTweet = hashTag.Replace(formattedTweet, delegate (Match m)
{
string val = m.Value;
//return string.Format("HASHTAG");
return string.Empty;
});
return formattedTweet;
}
The following code works for the hashtags:
static void Main(string[] args)
{
string longTweet = #"Long sentence #With #Some schools like #AzharSchool and spread out
over two #StPaulSchool lines ";
string result = Regex.Replace(longTweet, #"\#\w+", match => ReplaceHashTag(match.Value), RegexOptions.Multiline);
Console.WriteLine(result);
}
private static string ReplaceHashTag(string input)
{
switch (input)
{
case "#StPaulSchool": return "St. Paul School";
case "#AzharSchool": return "Azhar School";
default:
return input; // hashtag not recognized
}
}
If the list of hashtags to convert becomes very long it would be more succint to use a Dictionary, eg:
private static Dictionary<string, string> _hashtags
= new Dictionary<string, string>
{
{ "#StPaulSchool", "St. Paul School" },
{ "#AzharSchool", "Azhar School" },
};
and rewrite the body of the ReplaceHashTag method with this:
if (!_hashtags.ContainsKey(hashtag))
{
return hashtag;
}
return _hashtags[hashtag];
I believe that using regular expressions makes this code unreadable and difficult to maintain. Moreover, you are using regular expression to find a very simple pattern - to find strings that starts with the hashtag (#) character.
I suggest a different approach: Break the sentence into words, transform each word according to your business rules, then join the words back together. Although this sounds like a lot of work, and it may be the case in another language, the C# String class makes this quite easy to implement.
Here is a basic example of a console application that does the requested functionality, the business rules are hard-coded, but this should be enough so you could continue:
static void Main(string[] args)
{
string text = "Example #First #Second #NoMatch not a word ! \nSecond row #Second";
string[] wordsInText = text.Split(' ');
IEnumerable<string> transformedWords = wordsInText.Select(selector: word => ReplaceHashTag(word: word));
string transformedText = string.Join(separator: " ", values: transformedWords);
Console.WriteLine(value: transformedText);
}
private static string ReplaceHashTag(string word)
{
if (!word.StartsWith(value: "#"))
{
return word;
}
string wordWithoutHashTag = word.Substring(startIndex: 1);
if (wordWithoutHashTag == "First")
{
return "FirstTransformed";
}
if (wordWithoutHashTag == "Second")
{
return "SecondTransformed";
}
return string.Empty;
}
Note that this approach gives you much more flexibility chaining your logic, and by making small modifications you can make this code a lot more testable and incremental then the regular expression approach

C# - How can stop recursive function?

I want to replace all the word that start via # with another word, here is my code:
public string SemiFinalText { get; set; }
public string FinalText { get; set; }
//sample text : "aaaa bbbb #cccc dddd #eee fff g"
public string GetProperText(string text)
{
if (text.Contains('#'))
{
int index = text.IndexOf('#');
string restText = text.Substring(index);
var indexLast = restText.IndexOf(' ');
var oldName = text.Substring(index, indexLast);
string restText2 = text.Substring( index + indexLast);
SemiFinalText += text.Substring(0, index + indexLast).Replace(oldName, "#New");
if (restText2.Contains('#'))
{
GetProperText(restText2);
}
FinalText = SemiFinalText + restText2;
return FinalText;
}
else
{
return text;
}
}
When return FinalText; is executed I want to stop recursive function. How can fix it?
Maybe another approach is better than recursive function. If you know another way please give an answer to me.
You don't need a recursive solution for this problem. You have a string containing a number of words (separated by spaces) and you want to replace the ones starting with an '#' with another string. Modifying your solution to have a simple method that splits based on spaces, replaces all words starting with # and then combines them once again.
Using Linq:
string text = "aaaa bbbb #cccc dddd #eee fff g";
FinalText = GetProperText(text, "New");
public string GetProperText(string text, string replacewith)
{
text = string.Join(" ", text.Split(' ').Select(x => x.StartsWith("#") ? replacewith: x));
return text;
}
Output: aaaa bbbb New dddd New fff g
Using Regex:
Regex rgx = new Regex("#([^ #])*");
string result = rgx.Replace(text, replaceword);
Solution with Regular Expressions:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string pattern = #"#\w+";
var r = new Regex(pattern);
Console.WriteLine(r.Replace("ABC #ABC ABC #DEF klm.#bhsh", "BOOM!"));
}
}
This does not rely on space character being the delimiter, any non-word (letters and numbers) can be used to separate the 'words'. This example outputs:
ABC BOOM! ABC BOOM! klm.BOOM!
You can test it out here: https://dotnetfiddle.net/rZyjjg
If you're new to Regex: .NET Introduction to Regular Expressions
Here also the proper way to do it recursively for anyone interested. I think your stopping condition was actually oke, but you should concatenate the outcome of the recursive function call to the already processed text. Also I think that using global variables in a recursive function defeats its purpose a little bit.
That being said I think that using RegEx from one of the supplied answer is better and faster.
The recursive code:
//sample text : "aaaa bbbb #cccc dddd #eee fff g"
public string GetProperText(string text)
{
if (text.Contains('#'))
{
int index = text.IndexOf('#'); //Index of first occuring '#'
var indexLast = text.IndexOf(' ',index); //Index of first ' ' after '#'
var oldName = text.Substring(index, indexLast); //Old Name
string processedText = text.Substring(0, index + indexLast).Replace(oldName, "New"); //String with new name
string restText = text.Substring(indexLast); //Rest Text
if (text.Contains('#'))
{
//Here the outcome of the function is pasted on the allready processed text part.
text = processedText + GetProperText(restText);
}
return text;
}
else
{
return text;
}
}

how can i trim the string in c# after each file extension name

I have a string of attachments like this:
"SharePoint_Health Check Assessment.docx<br>Tes‌​t Workflow.docx<br>" .
and i used this method :
AttachmentName = System.Text.RegularExpressions.Regex.Replace(AttachmentName, #"<(.|\n)*?>", "String.Empty");
and i got result :
SharePoint_Health Check Assessment.docxTest Workflow.docx
How can i split the string using c# and get the result with each file name seperately like :
SharePoint_Health Check Assessment.docx
Test Workflow.docx
and then show them into some control one by one.
and after that i want just the URL of the string like
"http://srumos1/departments/Attachments/2053_3172016093545_ITPCTemplate.txt"
and
"http://srumos1/departments/Attachments/2053_3172016093545_ITPCTemplate.txt"
how can i do that
i got it this way
AttachmentName = Regex.Replace(AttachmentName, #"<(.|\n)*?>", string.Empty);
Well there's your problem. You had valid delimiter but stripped them away for some reason. Leave the delimiters there and use String.Split to split them based on that delimiter.
Or replace the HTML with a delimiter instead of an empty string:
AttachmentName = Regex.Replace(AttachmentName, #"<(.|\n)*?>", "|");
And then split based off of that:
string[] filenames = AttachmentName.Split(new [] {'|'},
StringSplitOptions.RemoveEmptyEntries);
You can use a regex for extracting file names if you do not have any other clear way to do that. Can you try the code below ?;
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
using System.Text.RegularExpressions;
namespace ExtensionExtractingTest
{
class Program
{
static void Main(string[] args)
{
string fileNames = "test.docxtest2.txttest3.pdftest.test.xlxtest.docxtest2.txttest3.pdftest.test.xlxtest.docxtest2.txttest3.pdftest.test.xlxourtest.txtnewtest.pdfstackoverflow.pdf";
//Add your extensions to regex definition
Regex fileNameMatchRegex = new Regex(#"[a-zA-Z0-9]*(\.txt|\.pdf|\.docx|\.txt|\.xlx)", RegexOptions.IgnoreCase);
MatchCollection matchResult = fileNameMatchRegex.Matches(fileNames);
List<string> fileNamesList = new List<string>();
foreach (Match item in matchResult)
{
fileNamesList.Add(item.Value);
}
fileNamesList = fileNamesList.Distinct().ToList();
Console.WriteLine(string.Join(";", fileNamesList));
}
}
}
And a working example is here http://ideone.com/gbopSe
PS: Please keep in mind you have to know your file name extensions or you have to predict filename extension length 3 or 4 and that will be a painful string parsing operation.
Hope this helps

Categories