C# How to avoid splitting names in .Split()? - c#

So basically I have this loop where each sentence in processedSentencesList gets iterated and scanned for words which exist in the list entityString. And each entityString found in each sentence is added to var valid_words.
But the entities "Harry Potter" and "Ford Car" does not get added because of the 'sentence.Split()' statement.
How do I alter this code so that existing entities with spaces do not get separated in to two words?
List <string> entityString = new List<string>();
entityString.Add("Harry Potter"); //A name which i do not want to split
entityString.Add("Ford Car"); //A name which i do not want to split
entityString.Add("Broom");
entityString.Add("Ronald");
List <string> processedSentencesList = new List<string>();
processedSentencesList.Add("Harry Potter is a wizard");
processedSentencesList.Add("Ronald had a Broom and a Ford Car");
foreach (string sentence in processedSentencesList)
{
var words = sentence.Split(" ".ToCharArray());
//But it splits the names as well
var valid_words = words.Where(w =>
entityStrings.Any(en_li => en_li.Equals(w)));
//And therefore my names do not get added to the valid_words list
}
When printed, Output I get right now:
Broom
Ronald
Output I expect:
Harry Potter
Ford Car
Broom
Ronald
Basically, the entities with spaces in between (2 or more words) gets separated and thus cannot be matched to existing entities. How do I fix this?

Change your foreach with this :
List<String> valid_words = new List<String>();
foreach (string sentence in processedSentencesList)
{
valid_words.AddRange(entityString.Where(en_li => sentence.Contains(en_li)));
}
valid_words = valid_words.Distinct().ToList();

You could try matching instead of splitting.
[A-Z]\S+(?:\s+[A-Z]\S+)?
DEMO

You could loop through each item and use the 'String.Contains()' method, which will prevent you from having to split your search strings.
Example:
List<string> valid_words = new List<string>();
foreach (string sentence in processedSentencesList)
{
foreach (string entity in entityString)
{
if (sentence.Contains(entity))
{
valid_words.Add(entity);
}
}
}

Related

How to compare list of strings to a string where elements in the list might have letters be scrambled up?

I'm trying to write a lambda expression to compare members of a list to a string, but I also need to catch elements in the list that might have their letter scrambled up.
Here's code I got right now
List<string> listOfWords = new List<String>() { "abc", "test", "teest", "tset"};
var word = "test";
var results = listOfWords.Where(s => s == word);
foreach (var i in results)
{
Console.Write(i);
}
So this code will find string "test" in the list and will print it out, but I also want it to catch cases like "tset". Is this possible to do easily with linq or do I have to use loops?
How about sorting the letters and seeing if the resulting sorted sequences of chars are equal?
var wordSorted = word.OrderBy(c=>c);
listOfWords.Where(w => w.OrderBy(c=>c).SequenceEqual(wordSorted));

Get files with same/ similar name from array

I have multiple objects in an array of which the format:
id_name_date_filetype.
I need to take all the objects with, let's say same id or same name and insert them in a new array.
With the GetFiles method I already have all the object in one array and I have their names but I don't know how to differentiate them.
I have a foreach I which I'll be going through all the objects but I'm kind of stuck.
Any hints as to what do I do?
//Process the files
string[] filelist = Directory.GetFiles(SourceDirectory, "*.tsv*", SearchOption.TopDirectoryOnly).Select(filename => Path.GetFullPath(filename)).Distinct().ToArray();
foreach (string file in filelist)
{
string[] fileNameSplit = file.Split('_');
switch (fileNameSplit.Last().ToLower())
{
case "assets.tsv":
assets = ReadDataFromCsv<Asset>(file);
break;
case "financialaccounts.tsv":
financialAccounts = ReadDataFromCsv<FinancialAccount>(file);
break;
case "households.tsv":
households = ReadDataFromCsv<Household>(file);
break;
case "registrations.tsv":
registrations = ReadDataFromCsv<Registration>(file);
break;
case "representatives.tsv":
representatives = ReadDataFromCsv<Representative>(file);
break;
}
}
// Find all files from one firm and insert them in a list
foreach (string file in filelist)
{
}
Here is a linq approach as I proposed it in my comment:
First get all distinct ID's from your filelist
string [] allDistinctIDs = filelist.Select(x=>x.Split('_').First()).Distinct(). ToArray();
now you can iterate through the list of ID's and compare each value
for (int i = 0; i < allDistinctIDs.Length; i++)
{
string [] allSameIDStrings = filelist.Where(x=>x.Split('_').First() == allDistinctIDs[i]).ToArray();
}
Basically you split every item by '_' and compare the first (id part) of the string with each item from your list of distinct ID's.
Another approach would be to use GroupBy.
// example input
string[] filelist = {
"123_Name1_xxx_Asset.tsv",
"456_Name2_xxx_Asset.tsv",
"123_Name3_xxx_HouseHold.tsv",
"456_Name4_xxx_HouseHold.tsv"};
IEnumerable<IGrouping<string, string>> ID_Groups = filelist.GroupBy(x=>x.Split('_').First());
This would give you a collection of all filenames grouped by the ID:
at each position in ID_Groups is a list of items with the same ID. You can filter them by fileName:
foreach (var id_group in ID_Groups)
{
assets = ReadDataFromCsv<Asset>(id_group.FirstOrDefault(x=>x.ToLower().Contains("assets.tsv")));
// and so on
households = ReadDataFromCsv<Household>(id_group.FirstOrDefault(x=>x.ToLower().Contains("households.tsv")));
}
You gotta define what is "Similar" to you. It could be the initial letter of the file name? Half of it? Whole filename?
This function should do more or less what you want without using Linq or something more complex than loops.
var IDOffileNameIWant = object.GetFiles()[0].id;
List<string> arrayThatContainsSimilar = new List<string>();
foreach(var file in object.GetFiles())
{
if(file.Name.Split('_')[0].Contains(IDOffileNameIWant))
{
arrayThatContainsSimilar.Add(file.Name);
}
}
It's very basic and can be refined, but you gotta give more details on what is the exact result you want to obtain.
Since you're still struggling, here's a working example:
List<string> files = new List<string>() {
"123_novica_file1", "123_novica_file3", "123_novica_file2", "456_myfilename_file1",
"789_myfilename_file1", "101_novica_file2", "102_novica_file3"};
List<string> filesbyID = new List<string>();
List<string> filesbyName = new List<string>();
string theIDPattern = "123";
string theFileNamePattern = "myfilename";
foreach(var file in files)
{
//splitting the filename and checking by ID
if(file.Split('_')[0].Contains(theIDPattern))
{
filesbyID.Add(file);
}
//splitting the filename and checking by name
if (file.Split('_')[1].Contains(theFileNamePattern))
{
filesbyName.Add(file);
}
}
Result:
files by id:
123_novica_file1
123_novica_file3
123_novica_file2
files by name:
456_myfilename_file1
789_myfilename_file1

Split Lists into Sublists, List.Contains() does not find match

I have 2 Lists, Planets and Favorites. They contain multiple words separated by spaces.
I split the Lists by space into Sublists.
Now I want to check if Planets contains a Name from Favorites.
But Planets.Contains() does not find a match.
http://rextester.com/YLOG10363
// Planets List
//
List<string> Planets = new List<string>();
Planets.Add("First Mercury Gray");
Planets.Add("Second Venus Yellow");
Planets.Add("Third Earth Blue");
Planets.Add("Fourth Mars Red");
// Favorites List
//
List<string> Favorites = new List<string>();
Favorites.Add("Venus Hot");
Favorites.Add("Mars Cold");
// Sublists
//
string[] arrPlanets = null;
string[] arrFavorites = null;
List<string> Order = new List<string>();
List<string> Names = new List<string>();
List<string> Colors = new List<string>();
// In each Line of Planets & Favorites Lists, Split by Space
// Add Word to it's Sublist
//
for (int i = 0; i < Planets.Count; i++)
{
// Create Planet Sublists
arrPlanets = Convert.ToString(Planets[i]).Split(' ');
Order.Add(arrPlanets[0]);
Names.Add(arrPlanets[1]);
Colors.Add(arrPlanets[2]);
// Create Favorites Sublist
// Prevent Favorites index from going out of range
if (i < Favorites.Count())
{
arrFavorites = Convert.ToString(Favorites[i]).Split(' ');
// Display Message if Planets List Contains a Name from Favorites
//
if (Planets.Contains(arrFavorites[0]))
{
Console.WriteLine("Favorite Detected.");
}
}
}
OK, I've looked at all the answers, and you might want to look at this possibility as well, since it represents the smallest amount of refactoring to your code:
Replace
if (Planets.Contains(arrFavorites[0]))
With
if (Planets.Any(p => p.Contains(arrFavorites[0])))
Not the most performant, since there are better algorithms to do checks for matching terms. But looking at your code, it doesn't seem like the most important thing that you're after. So possibly, my approach might make sense, then.
Hope that helps.
It really isn't very clear what you are trying to accomplish, but since you said you want to find out if Planets contains any name from Favorites, and assuming the name is always the first word in each favorite,
var PlanetsHasFavorite = Planets.Any(p => Favorites.Select(f => f.Split(' ')[0]).Any(f => p.Split(' ').Contains(f)));
PlanetsHasFavorite will be true if any Planet matches a name from Favorites.
Assuming you meant you actually want to get a list of the matching planets,
var PlanetsAreFavorite = Planets.Where(p => Favorites.Select(f => f.Split(' ')[0]).Any(f => p.Split(' ').Contains(f))).ToList();
If you want to see if any word in any Favorite is contained in any Planet, then you only need to split the Favorites into words, and then see if any planet contains any word.
So, to get all the Favorite words, we can do something like this:
var favoriteWords = Favorites.SelectMany(i => i.Split(' '));
Now, we can loop through all the planets and see if we have any matches:
Planets.ForEach(p =>
{
if (favoriteWords.Any(p.Contains))
{
Console.WriteLine($"One of your favorite planets is: {p}");
}
});
And the result is:
Or, if you just wanted to show the favorite words that were matched, you could do something like:
Console.WriteLine("These favorite words were matched: ");
Planets.ForEach(p => favoriteWords.Where(p.Contains).ToList().ForEach(Console.WriteLine));
Don't use arrays if you don't know the number of items you want them to hold and DON'T knowingly set variables to null.
Use Lists instead:
List<string> planets = new List<string>();
List<string> favorites = new List<string>();
That being said your code is totally wrong.
What you are trying to achieve is something like this:
List<string> Planets = new List<string>();
Planets.Add("First Mercury Gray");
Planets.Add("Second Venus Yellow");
Planets.Add("Third Earth Blue");
Planets.Add("Fourth Mars Red");
List<string> Favorites = new List<string>();
Favorites.Add("Venus Hot");
Favorites.Add("Mars Cold");
// Unless you need favorites to hold tokens seperated by a white space
// you shouldn't make another list such as this one:
List<string> faveKeywords = Favorites.SelectMany(fave => fave.Split(' ')).ToList();
foreach (var token in from line in Planets from token in line.Split(' ') where faveKeywords.Contains(token) select token)
{
Console.WriteLine($"Favorite detected: {token}" );
}
Or if you keep insisting on making that Order, Color, Name:
foreach (var tokens in Planets.Select(str => str.Split(' ')))
{
Order.Add(tokens[0]);
Names.Add(tokens[1]);
Colors.Add(tokens[2]);
foreach (var token in tokens.Where(token => faveKeywords.Contains(token)))
{
Console.WriteLine($"Favorite detected: {token}");
}
}
You need to learn from examples such as this one and observe more than you ask.

Get the Substrings within the List<string> collection using Linq

I've a collection list.
List<string> mycollections = new List<string>(new string[]
{
"MyImages/Temp/bus.jpg",
"MyImages/Temp/car.jpg",
"MyImages/Temp/truck.jpg",
"MyImages/Temp/plane.jpg",
"MyImages/Temp/ship.jpg",
});
I required only files in a List such asbus.jpg, car.jpg...... Here i do not need "MyImages/Temp/" portion of the string in the same list.
I tried with Substring and Split with Linq queries but couldn't get the expected result.
Use Path.GetFileName instead of substring like:
var fileNames = mycollections.Select(r => Path.GetFileName(r)).ToList();
For output:
var fileNames = mycollections.Select(r => Path.GetFileName(r));
foreach (var item in fileNames)
{
Console.WriteLine(item);
}
Output:
bus.jpg
car.jpg
truck.jpg
plane.jpg
ship.jpg
How about this:
mycollections.Select(s => s.Split('/').Last());
That will split each string by slashes and return the last item.

Conditional Split String with multiple delimiters

I have a String
string astring="#This is a Section*This is the first category*This is the
second Category# This is another Section";
I want to separate this string according to delimiters. If I have # at the start this will indicate Section string (string[] section). If the string will starts with * this will indicate that I have a category(string[] category).
As a result I want to have
string[] section = { "This is a Section", "This is another Section" };
string[] category = { "This is the first category ",
"This is the second Category " };
I have found this answer:
string.split - by multiple character delimiter
But it is not what I am trying to do.
string astring=#"#This is a Section*This is the first category*This is the second Category# This is another Section";
string[] sections = Regex.Matches(astring, #"#([^\*#]*)").Cast<Match>()
.Select(m => m.Groups[1].Value).ToArray();
string[] categories = Regex.Matches(astring, #"\*([^\*#]*)").Cast<Match>()
.Select(m => m.Groups[1].Value).ToArray();
With string.Split You could do this (faster than the regex ;) )
List<string> sectionsResult = new List<string>();
List<string> categorysResult = new List<string>();
string astring="#This is a Section*This is the first category*This is thesecond Category# This is another Section";
var sections = astring.Split('#').Where(i=> !String.IsNullOrEmpty(i));
foreach (var section in sections)
{
var sectieandcategorys = section.Split('*');
sectionsResult.Add(sectieandcategorys.First());
categorysResult.AddRange(sectieandcategorys.Skip(1));
}

Categories