Conditional Split String with multiple delimiters

Conditional Split String with multiple delimiters - c#

I have a String
string astring="#This is a Section*This is the first category*This is the
second Category# This is another Section";
I want to separate this string according to delimiters. If I have # at the start this will indicate Section string (string[] section). If the string will starts with * this will indicate that I have a category(string[] category).
As a result I want to have
string[] section = { "This is a Section", "This is another Section" };
string[] category = { "This is the first category ",
"This is the second Category " };
I have found this answer:
string.split - by multiple character delimiter
But it is not what I am trying to do.

string astring=#"#This is a Section*This is the first category*This is the second Category# This is another Section";
string[] sections = Regex.Matches(astring, #"#([^\*#]*)").Cast<Match>()
.Select(m => m.Groups[1].Value).ToArray();
string[] categories = Regex.Matches(astring, #"\*([^\*#]*)").Cast<Match>()
.Select(m => m.Groups[1].Value).ToArray();

With string.Split You could do this (faster than the regex ;) )
List<string> sectionsResult = new List<string>();
List<string> categorysResult = new List<string>();
string astring="#This is a Section*This is the first category*This is thesecond Category# This is another Section";
var sections = astring.Split('#').Where(i=> !String.IsNullOrEmpty(i));
foreach (var section in sections)
{
var sectieandcategorys = section.Split('*');
sectionsResult.Add(sectieandcategorys.First());
categorysResult.AddRange(sectieandcategorys.Skip(1));
}

Related

Dynamically concatenate value in list if pattern matched

I have a list of string and an array of pattern
List<string> filePaths = Directory.GetFiles(dir, filter).ToList();
string[] prefixes = { "0.", "1.", "2.", "3.", "4.", "5.", "6.", "7.", "8.", "9." };
I want to replace value in filePaths for example like this:
"1. fileA" becomes "01. fileA"
"2. fileB" becomes "02. fileB"
"10. fileC" becomes "10. fileC" (since "10." is not in prefixes list)
Is there a way to do this without looping?

You can do the following, using Select:
class Program
{
static void Main(string[] args)
{
string[] prefixes = { "0.", "1.", "2.", "3.", "4.", "5.", "6.", "7.", "8.", "9." };
var result = Directory.GetFiles(dir, filter).Select(s => prefixes.Contains(s.Substring(0, 2)) ? "0" + s : s).ToList();
}
}
You enumerate the enumerable to check for the condition whether padding is needed, if so you pad, otherwise just return the original value.

No need for a prefixes list, you can just pad left with 0's using regex:
string input = "1. fileA";
string result = Regex.Replace(input, #"^\d+", m => m.Value.PadLeft(2, '0'));
To use on the whole list:
var filePaths = Directory.GetFiles(dir, filter).Select(s => Regex.Replace(s, #"^\d+", m => m.Value.PadLeft(2, '0'))).ToList();

Check if a particular string is contained in a list of strings

I'm trying to search a string to see if it contains any strings from a list,
var s = driver.FindElement(By.Id("list"));
var innerHtml = s.GetAttribute("innerHTML");
innerHtml is the string I want to search for a list of strings provided by me, example
var list = new List<string> { "One", "Two", "Three" };
so if say innerHtml contains "One" output Match: One

You can do this in the following way:
int result = list.IndexOf(innerHTML);
It will return the index of the item with which there is a match, else if not found it would return -1.
If you want a string output, as mentioned in the question, you may do something like:
if (result != -1)
Console.WriteLine(list[result] + " matched.");
else
Console.WriteLine("No match found");
Another simple way to do this is:
string matchedElement = list.Find(x => x.Equals(innerHTML));
This would return the matched element if there is a match, otherwise it would return a null.
See docs for more details.

You can do it with LINQ by applying Contains to innerHtml for each of the items on the list:
var matches = list.Where(item => innerHtml.Contains(item)).ToList();
Variable matches would contain a subset of strings from the list which are matched inside innerHtml.
Note: This approach does not match at word boundaries, which means that you would find a match of "One" when innerHtml contains "Onerous".

foreach(var str in list)
{
if (innerHtml.Contains(str))
{
// match found, do your stuff.
}
}
String.Contains documentation

For those who want to serach Arrray of chars in another list of strings
List WildCard = new() { "", "%", "?" };
List PlateNo = new() { "13eer", "rt4444", "45566" };
if (WildCard.Any(x => PlateNo.Any(y => y.Contains(x))))
Console.WriteLine("Plate has wildchar}");

C# How to avoid splitting names in .Split()?

So basically I have this loop where each sentence in processedSentencesList gets iterated and scanned for words which exist in the list entityString. And each entityString found in each sentence is added to var valid_words.
But the entities "Harry Potter" and "Ford Car" does not get added because of the 'sentence.Split()' statement.
How do I alter this code so that existing entities with spaces do not get separated in to two words?
List <string> entityString = new List<string>();
entityString.Add("Harry Potter"); //A name which i do not want to split
entityString.Add("Ford Car"); //A name which i do not want to split
entityString.Add("Broom");
entityString.Add("Ronald");
List <string> processedSentencesList = new List<string>();
processedSentencesList.Add("Harry Potter is a wizard");
processedSentencesList.Add("Ronald had a Broom and a Ford Car");
foreach (string sentence in processedSentencesList)
{
var words = sentence.Split(" ".ToCharArray());
//But it splits the names as well
var valid_words = words.Where(w =>
entityStrings.Any(en_li => en_li.Equals(w)));
//And therefore my names do not get added to the valid_words list
}
When printed, Output I get right now:
Broom
Ronald
Output I expect:
Harry Potter
Ford Car
Broom
Ronald
Basically, the entities with spaces in between (2 or more words) gets separated and thus cannot be matched to existing entities. How do I fix this?

Change your foreach with this :
List<String> valid_words = new List<String>();
foreach (string sentence in processedSentencesList)
{
valid_words.AddRange(entityString.Where(en_li => sentence.Contains(en_li)));
}
valid_words = valid_words.Distinct().ToList();

You could try matching instead of splitting.
[A-Z]\S+(?:\s+[A-Z]\S+)?
DEMO

You could loop through each item and use the 'String.Contains()' method, which will prevent you from having to split your search strings.
Example:
List<string> valid_words = new List<string>();
foreach (string sentence in processedSentencesList)
{
foreach (string entity in entityString)
{
if (sentence.Contains(entity))
{
valid_words.Add(entity);
}
}
}

extracting the common prefixes from a list of strings

I have a list of strings, such as:
{ abc001, abc002, abc003, cdef001, cdef002, cdef004, ghi002, ghi001 }
I want to get all the common unique prefixes; for example, for the above list:
{ abc, cdef, ghi }
How do I do that?

var list = new List<String> {
"abc001", "abc002", "abc003", "cdef001",
"cdef002", "cdef004", "ghi002", "ghi001"
};
var prefixes = list.Select(x = >Regex.Match(x, #"^[^\d]+").Value).Distinct();

It may be a good idea to write a helper class to represent your data. For example:
public class PrefixedNumber
{
private static Regex parser = new Regex(#"^(\p{L}+)(\d+)$");
public PrefixedNumber(string source) // you may want a static Parse method.
{
Match parsed = parser.Match(source); // think about an error here when it doesn't match
Prefix = parsed.Groups[1].Value;
Index = parsed.Groups[2].Value;
}
public string Prefix { get; set; }
public string Index { get; set; }
}
You need to come up with a better name, of course, and better access modifiers.
Now the task is quite easy:
List<string> data = new List<string> { "abc001", "abc002", "abc003", "cdef001",
"cdef002", "cdef004", "ghi002", "ghi001" };
var groups = data.Select(str => new PrefixedNumber(str))
.GroupBy(prefixed => prefixed.Prefix);
The result is all data, parsed, and grouped by the prefix.

You can achieve that using Regular Expression to select the text part, and then use HashSet<string> to add that text part so no duplication added:
using System.Text.RegularExpressions;
//simulate your real list
List<string> myList = new List<string>(new string[] { "abc001", "abc002", "cdef001" });
string pattern = #"^(\D*)\d+$";
// \D* any non digit characters, and \d+ means followed by at least one digit,
// Note if you want also to capture string like "abc" alone without followed by numbers
// then the pattern will be "^(\D*)$"
Regex regex = new Regex(pattern);
HashSet<string> matchesStrings = new HashSet<string>();
foreach (string item in myList)
{
var match = regex.Match(item);
if (match.Groups.Count > 1)
{
matchesString.Add(match.Groups[1].Value);
}
}
result:
abc, cde

Assuming that your prefix is all alpha characters and terminited by the first non-alpha character, you could use the following LINQ expression
List<string> listOfStrings = new List<String>()
{ "abc001d", "abc002", "abc003", "cdef001", "cdef002", "cdef004", "ghi002", "ghi001" };
var prefixes = (from s in listOfStrings
select new string(s.TakeWhile(c => char.IsLetter(c)).ToArray())).Distinct();

What's wrong with this ForEach loop?

Yep... it's one of those days.
public string TagsInput { get; set; }
//further down
var tagList = TagsInput.Split(Resources.GlobalResources.TagSeparator.ToCharArray()).ToList();
tagList.ForEach(tag => tag.Trim()); //trim each list item for spaces
tagList.ForEach(tag => tag.Replace(" ", "_")); //replace remaining inner word spacings with _
Both ForEach loops don't work. tagList is just a List.
Thank you!

Trim() and Replace() don't modify the string they're called on. They create a new string that has had the action applied to it.
You want to use Select, not ForEach.
tagList = tagList.Select(t => t.Trim()).Select(t => t.Replace(" ", "_")).ToList();

ForEach (and other "linq" methods) does not modify the list instance.
tagList = tagList.Select(tag => tag.Trim().Replace(" ", "_")).ToList();

The reason is string is immutuable. So the result of each Trim() or Replac() function will produce a new string. You need to reassign to the original element in order to see the updated value.

This is exactly why Microsoft havent implemented ForEach on an IEnumerable. What's wrong with this?
public string[] TagsInput { get; set; }
//further down
var adjustedTags = new List<string>();
foreach (var tag in TagsInput.Split(Resources.GlobalResources.TagSeparator.ToCharArray()))
{
adjustedTags.Add(tag.Trim().Replace(" ", "_"));
}
TagsInput = adjustedTags.ToArray();

If by don't work, you mean that they don't actually do anything, I think you need to adjust your code a bit:
public string TagsInput { get; set; }
//further down
var tagList = TagsInput.Split(Resources.GlobalResources.TagSeparator.ToCharArray()).ToList();
tagList.ForEach(tag => tag = tag.Trim()); //trim each list item for spaces
tagList.ForEach(tag => tag = tag.Replace(" ", "_")); //replace remaining inner word spacings with _
Trim and Replace don't change the value of the string, they return the new string value.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Conditional Split String with multiple delimiters - c#

Related

Dynamically concatenate value in list if pattern matched

Check if a particular string is contained in a list of strings

C# How to avoid splitting names in .Split()?

extracting the common prefixes from a list of strings

What's wrong with this ForEach loop?

Categories

Resources