extracting the common prefixes from a list of strings - c#

I have a list of strings, such as:
{ abc001, abc002, abc003, cdef001, cdef002, cdef004, ghi002, ghi001 }
I want to get all the common unique prefixes; for example, for the above list:
{ abc, cdef, ghi }
How do I do that?

var list = new List<String> {
"abc001", "abc002", "abc003", "cdef001",
"cdef002", "cdef004", "ghi002", "ghi001"
};
var prefixes = list.Select(x = >Regex.Match(x, #"^[^\d]+").Value).Distinct();

It may be a good idea to write a helper class to represent your data. For example:
public class PrefixedNumber
{
private static Regex parser = new Regex(#"^(\p{L}+)(\d+)$");
public PrefixedNumber(string source) // you may want a static Parse method.
{
Match parsed = parser.Match(source); // think about an error here when it doesn't match
Prefix = parsed.Groups[1].Value;
Index = parsed.Groups[2].Value;
}
public string Prefix { get; set; }
public string Index { get; set; }
}
You need to come up with a better name, of course, and better access modifiers.
Now the task is quite easy:
List<string> data = new List<string> { "abc001", "abc002", "abc003", "cdef001",
"cdef002", "cdef004", "ghi002", "ghi001" };
var groups = data.Select(str => new PrefixedNumber(str))
.GroupBy(prefixed => prefixed.Prefix);
The result is all data, parsed, and grouped by the prefix.

You can achieve that using Regular Expression to select the text part, and then use HashSet<string> to add that text part so no duplication added:
using System.Text.RegularExpressions;
//simulate your real list
List<string> myList = new List<string>(new string[] { "abc001", "abc002", "cdef001" });
string pattern = #"^(\D*)\d+$";
// \D* any non digit characters, and \d+ means followed by at least one digit,
// Note if you want also to capture string like "abc" alone without followed by numbers
// then the pattern will be "^(\D*)$"
Regex regex = new Regex(pattern);
HashSet<string> matchesStrings = new HashSet<string>();
foreach (string item in myList)
{
var match = regex.Match(item);
if (match.Groups.Count > 1)
{
matchesString.Add(match.Groups[1].Value);
}
}
result:
abc, cde

Assuming that your prefix is all alpha characters and terminited by the first non-alpha character, you could use the following LINQ expression
List<string> listOfStrings = new List<String>()
{ "abc001d", "abc002", "abc003", "cdef001", "cdef002", "cdef004", "ghi002", "ghi001" };
var prefixes = (from s in listOfStrings
select new string(s.TakeWhile(c => char.IsLetter(c)).ToArray())).Distinct();

Related

Good approach to sorting strings which contain number ie test 1, test10, test2

I am using C#, .NET 4.7
I have 3 strings ie.
[test.1, test.10, test.2]
I need to sort them to get:
test.1
test.2
test.10
I may get other strings like
[1test, 10test, 2test]
which should produce:
1test
2test
10test
using same approach.
Thoughts?
Thanks in advance.
You could use Parse the number using Regex and then sort the string. For example,
Regex re = new Regex(#"\d+");
var result = strArray.Where(x=>re.Match(x).Success)
.Select(x=> new { Key = int.Parse(re.Match(x).Value),Value = x})
.OrderBy(x=>x.Key).Select(x=>x.Value);
Where strArray is the collection of strings.
Please note in the above case, you are ignoring string which doesn't have a numeric part (as it wasn't described in OP). The numeric part of string is parsed using Regex, which is then used for sorting the collection.
Example,
Input
var strArray = new string[]{"1test", "10test", "2test"};
Output
1test
2test
10test
Input
var strArray = new string[]{"test.1", "test.10", "test.2"};
Outpuyt
test.1
test.2
test.10
For your first array you can do
var array = new[] { "test.1", "test.10", "test.2" };
var sortedArray = array.OrderBy(s => int.Parse(s.Substring(5, s.Length - 5)));
For the second array
var array = new[] { "1test", "2test", "10test" };
var sortedArray = array.OrderBy(s => int.Parse(s.Substring(0, s.Length - 4)));
Try this code. It uses SortedDictionary which always sort it's items by key when they are inserted.
static void Main(string[] args)
{
SortedDictionary<int, string> tuples = new SortedDictionary<int, string>();
string[] stringsToSortByNumbers = { "test.1", "test.10", "test.2" };
foreach (var item in stringsToSortByNumbers)
{
int numeric = Convert.ToInt32(new String(item.Where(Char.IsDigit).ToArray()));
tuples.Add(numeric, item);
}
foreach (var item in tuples)
{
Console.WriteLine(item.Value);
}
Console.ReadKey();
}

How to create a readable List<string> output?

My code is as follows:
public List<string> connect(String query_physician, String query_institution)
{
Regex pattern = new Regex(#"(?<=""link""\:\s"")[^""]*(?="")");
MatchCollection linkMatches = pattern.Matches(customSearchResult);
var list = new List<string>();
list = linkMatches.Cast<Match>().Select(match => match.Value).ToList(); //put the links into a list?!
foreach (var item in list) //take each item (link) out of the list...
{
return item; // ...and return it?! //Error, because item is a string
}
return null;
}
Like you see, I want to return each link (as a readable list of my json result and display it in my RichTextBox, but I know, var item is a string. Otherwise it doesn´t work. Either I become an unreadable list, or a string (with string.Join(.....Cast<>()).
Do I have this right, string.Join(.....Cast<>()) adds the single strings together? Still, I don't want them together. Anyway, do you know a way to solve this problem?
By the way, return null is only a wildcard.
As I understand it is continuation of your previous question. Assuming you have this function (I simplified it a bit):
public List<string> connect(String query_physician, String query_institution)
{
...
return Regex.Matches(customSearchResult, #"(?<=""link""\:\s"")[^""]*(?="")")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
}
You can do the following:
List<string> list = connect("", "");
string linksFormatted = string.Join(",", list);
To show the content in RichTextBox:
richTextBox1.AppendText(string.Join(Environment.NewLine, list));
Look at your method signature return type is List of string no string,
so much simplest approach:
public List<string> connect(String query_physician, String query_institution)
{ ...
//restults container
List<string> resultContainer = new List<String>();
Regex pattern = new Regex(#"(?<=""link""\:\s"")[^""]*(?="")");
MatchCollection linkMatches = pattern.Matches(customSearchResult);
var list = new List<string>();
list = linkMatches.Cast<Match>().Select(match => match.Value).ToList(); //put the links into a list?!
foreach (var item in list) //take each item (link) out of the list...
{
//add item to list
resultContainer.Add(item);
}
return resultContainer;
}

Conditional Split String with multiple delimiters

I have a String
string astring="#This is a Section*This is the first category*This is the
second Category# This is another Section";
I want to separate this string according to delimiters. If I have # at the start this will indicate Section string (string[] section). If the string will starts with * this will indicate that I have a category(string[] category).
As a result I want to have
string[] section = { "This is a Section", "This is another Section" };
string[] category = { "This is the first category ",
"This is the second Category " };
I have found this answer:
string.split - by multiple character delimiter
But it is not what I am trying to do.
string astring=#"#This is a Section*This is the first category*This is the second Category# This is another Section";
string[] sections = Regex.Matches(astring, #"#([^\*#]*)").Cast<Match>()
.Select(m => m.Groups[1].Value).ToArray();
string[] categories = Regex.Matches(astring, #"\*([^\*#]*)").Cast<Match>()
.Select(m => m.Groups[1].Value).ToArray();
With string.Split You could do this (faster than the regex ;) )
List<string> sectionsResult = new List<string>();
List<string> categorysResult = new List<string>();
string astring="#This is a Section*This is the first category*This is thesecond Category# This is another Section";
var sections = astring.Split('#').Where(i=> !String.IsNullOrEmpty(i));
foreach (var section in sections)
{
var sectieandcategorys = section.Split('*');
sectionsResult.Add(sectieandcategorys.First());
categorysResult.AddRange(sectieandcategorys.Skip(1));
}

See if items in a string list contain a certain string

Basically i have a string list as such:
/forum/
/phpld/
/php/
Now i want to check if any of the url:
http://www.url.com/forum/
contains any values from the string list.
In the above case it should match because /forum/ is in the url.
I was thinking something like this:
foreach (string filter in _filterList)
{
if (PAGEURL.Trim().Contains(filter.Trim()))
{
_parseResultsFinal.Add(PAGEURL);
filteredByURL++;
break;
}
}
But i cannot get the above to be accurate
How would i do this? :)
Try this:
_filterList.Any(filter => PAGEURL.Trim().Contains(filter.Trim()));
You may do PAGEURL = PAGEURL.Trim() before this expression to not run it each time.
String.Contains() is case-sensitive and culture-insensitive, so if there are any case differences that could be the cause of the 'inaccuracy' that you are experiencing.
If you suspect this may be the problem (or even as a viable alternative) you can try this as the 'if' clause:
if (PAGEURL.Trim().IndexOf(filter.Trim(), StringComparison.OrdinalIgnoreCase) >= 0)
I'm not abundantly clear on what you want to do here, it seems as though if a URL contains any of the filters then you want to add the URL to the list.
List<string> parseResultsFinal = new List<string>();
if (_filterList.Any(x => PAGEURL.Contains(x))
{
parseResultsFinal.Add(PAGEURL);
}
Try to use that.
I would try the following:
var trimmedUrl = PageURL.Replace("http://", "");
var parts = trimmedUrl.Split("/");
var filterList = new List<string> { "forum", "phpld", "php" }
var anyContains = parts.Any(o => filterList.contains(o));
I'd change segments filters to simple words (without slashes, trimmed before adding to filter list):
var _filterList = new List<string>()
{
"forum", "phpld", "php"
};
And used regex to search for segments in url (ignore case, optional slash at the end of url)
bool IsSegmentInUrl(string url, string segment)
{
string pattern = String.Format(".*/{0}(/|$)", segment);
return Regex.IsMatch(url, pattern, RegexOptions.IgnoreCase);
}
Usage:
if (_filterList.Any(filter => IsSegmentInUrl(PAGEURL, filter))
{
_parseResultsFinal.Add(PAGEURL);
filteredByURL++;
}
More readable solution - create extensions method
public static bool ContainsSegment(this string url, string segment)
{
string pattern = String.Format("http://.*/{0}(/|$)", segment);
return Regex.IsMatch(url, pattern, RegexOptions.IgnoreCase);
}
Now code looks very self-describing:
if (_filterList.Any(filter => PAGEURL.ContainsSegment(filter))
{
_parseResultsFinal.Add(PAGEURL);
filteredByURL++;
}

Enumerated int List<GradeRange>

I really have no clue about enumerated list, but after some research I found that this list may help solve my problem. So I have a string in my settings called strGrades, and it is a range of strings that I manually update. The range is 0155-0160, 0271-0388, 0455-0503, 0588-687. What I basically want to do is find the values that are not in this grade list (for example 0161,0389, 0504-0587...)
So I came up with a function that will allow me to get each match in the grade range:
public static List<GradeRange> GetValidGrades()
{
MatchCollection matches= Regex.Matches(Settings.Default.productRange,
Settings.Default.srGradeRange);
List<GradeRange> ranges= new List<GradeRange();
if(matches.Count >0)
{
foreach (Match match in matches)
{
ranges.Add(new GradeRange() 23 {
Start= int.Parse(match.Groups["Start"].Value),
Stop= int.Parse(match.Groups["Stop"].Value)
});
}
}
return ranges;
}
here is the grade range class
public class GrandRange
{
public int Start{get; set;)
public int Stop {get; set; )
}
So the function above caputures my Start and End values, can anyone please help me get this into a list where I can find the values that fall outside of the range values, I just need a starting point. Thanks so much!
You could use a custom extension method that creates .Between along with a Where
var myFilteredList = list.Where(x=>!myValue.Between(x.Start, x.Stop, true));
This isnt the most performant answer, but if you need a list of all the numbers that are not between certain ranges, then you could do something like this:
var missingNumbers = new List<int>();
var minStop = list.OrderBy(x=>x.Stop).Min().Stop;
var maxStart = list.OrderBy(x=>x.Start).Max().Start;
Enumerable.Range(minStop, maxStart).ToList()
.ForEach(x=>
{
if(!x.Between(x.Start, x.Stop, true))
missingNumbers.Add(x);
}
);
Here this should get you started
var strings = "0155-0160, 0271-0388, 0455-0503, 0588-687";
var splitStrings = strings.Split(char.Parse(","));
var grads = new List<GrandRange>();
foreach (var item in splitStrings) {
var splitAgain = item.Split(char.Parse("-"));
var grand = new GrandRange
{
Start = int.Parse(splitAgain[0]),
Stop = int.Parse(splitAgain[1])
};
grads.Add(grand);
}
}

Categories