I have an array of strings similar to this (shown on separate lines to illustrate the pattern):
{ "aa002","aa003","aa004","aa005","aa006","aa007", // note that aa008 is missing
"aa009"
"ba023","ba024","ba025"
"bb025",
"ca002","ca003",
"cb004",
...}
...and the goal is to collapse those strings into this comma-separated string of "ranges":
"aa002-aa007,aa009,ba023-ba025,bb025,ca002-ca003,cb004, ... "
I want to collapse them so I can construct a URL. There are hundreds of elements, but I can still convey all the information if I collapse them this way - putting them all into a URL "longhand" (it has to be a GET, not a POST) isn't feasible.
I've had the idea to separate them into groups using the first two characters as the key - but does anyone have any clever ideas for collapsing those sequences (without gaps) into ranges? I'm struggling with it, and everything I've come up with looks like spaghetti.
So the first thing that you need to do is parse the strings. It's important to have the alphabetic prefix and the integer value separately.
Next you want to group the items on the prefix.
For each of the items in that group, you want to order them by number, and then group items while the previous value's number is one less than the current item's number. (Or, put another way, while the previous item plus one is equal to the current item.)
Once you've grouped all of those items you want to project that group out to a value based on that range's prefix, as well as the first and last number. No other information from these groups is needed.
We then flatten the list of strings for each group into just a regular list of strings, since once we're all done there is no need to separate out ranges from different groups. This is done using SelectMany.
When that's all said and done, that, translated into code, is this:
public static IEnumerable<string> Foo(IEnumerable<string> data)
{
return data.Select(item => new
{
Prefix = item.Substring(0, 2),
Number = int.Parse(item.Substring(2))
})
.GroupBy(item => item.Prefix)
.SelectMany(group => group.OrderBy(item => item.Number)
.GroupWhile((prev, current) =>
prev.Number + 1 == current.Number)
.Select(range =>
RangeAsString(group.Key,
range.First().Number,
range.Last().Number)));
}
The GroupWhile method can be implemented like so:
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
T previous = iterator.Current;
while (iterator.MoveNext())
{
if (!predicate(previous, iterator.Current))
{
yield return list;
list = new List<T>();
}
list.Add(iterator.Current);
previous = iterator.Current;
}
yield return list;
}
}
And then the simple helper method to convert each range into a string:
private static string RangeAsString(string prefix, int start, int end)
{
if (start == end)
return prefix + start;
else
return string.Format("{0}{1}-{0}{2}", prefix, start, end);
}
Here's a LINQ version without the need to add new extension methods:
var data2 = data.Skip(1).Zip(data, (d1, d0) => new
{
value = d1,
jump = d1.Substring(0, 2) == d0.Substring(0, 2)
? int.Parse(d1.Substring(2)) - int.Parse(d0.Substring(2))
: -1,
});
var agg = new { f = data.First(), t = data.First(), };
var query2 =
data2
.Aggregate(new [] { agg }.ToList(), (a, x) =>
{
var last = a.Last();
if (x.jump == 1)
{
a.RemoveAt(a.Count() - 1);
a.Add(new { f = last.f, t = x.value, });
}
else
{
a.Add(new { f = x.value, t = x.value, });
}
return a;
});
var query3 =
from q in query2
select (q.f) + (q.f == q.t ? "" : "-" + q.t);
I get these results:
Related
I am working with sizeable set of data (~130.000 records), I've managed to transform it the way I want it (to csv).
Here is a simplified example of how the List looks like:
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2"
"Surname2, Name2;Address2;State2;YES;Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Now, I would like to merge the records if 1st, 2nd AND 3rd column match, like so:
output
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2 Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Here's what I've got so far:
output.GroupBy(x => new { c1 = x.Split(';')[0], c2 = x.Split(';')[1], c3 = x.Split(';')[2] }).Select(//have no idea what should go here);
First try to get the columns you need projecting the result in an anonymous type:
var query= from r in output
let columns= r.Split(';')
select new { c1 =columns[0], c2 =columns[1], c3 = columns[2] ,c5=columns[4]};
And then create the groups but now using the anonymous object you define in the previous query:
var result= query.GroupBy(e=>new {e.c1, e.c2, e.c3})
.Select(g=> new {SurName=g.Key.c1,
Name=g.Key.c2,
Address=g.Key.c3,
Groups=String.Join(",",g.Select(e=>e.c4)});
I know I'm missing some columns but I think you can get the idea.
PS: The fact I have separated the logic in two queries is just for readability propose, you can compose both queries in one but that is not going to change the performance because LINQ use deferred evaluation.
This is how I would do it:
class Program
{
static void Main(string[] args)
{
List<string> input = new List<string> {
"Surname1, Name1;Address1;State1;YES;Group1",
"Surname2, Name2;Address2;State2;YES;Group2",
"Surname2, Name2;Address2;State2;YES;Group1",
"Surname3, Name3;Address3;State3;NO;Group1",
"Surname1, Name1;Address2;State1;YES;Group1",
};
var transformed = input.Select(s => s.Split(';'))
.GroupBy( s => new string[] { s[0], s[1], s[2], s[3] },
(key, elements) => string.Join(";", key) + ";" + string.Join(" ", elements.Select(e => e.Last())),
new MyEqualityComparer())
.ToList();
}
}
internal class MyEqualityComparer : IEqualityComparer<string[]>
{
public bool Equals(string[] x, string[] y)
{
return x[0] == y[0] && x[1] == y[1] && x[2] == y[2];
}
public int GetHashCode(string[] obj)
{
int hashCode = obj[0].GetHashCode();
hashCode = hashCode ^ obj[1].GetHashCode();
hashCode = hashCode ^ obj[2].GetHashCode();
return hashCode;
}
}
Consider the first 4 columns as the grouping key, but only use the first 3 for the comparison (hence the custom IEqualityComparer).
Then if you have the (key, elements) groups, transform them so that you join the elements of the key with ; (remember, the key consists of the first 4 columns) and add to it the last element from every member of the group, joined with a space.
I have a linq query result as shown in the image. In the final query (not shown) I am grouping by Year by LeaveType. However I want to calculate a running total for the leaveCarriedOver per type over years. That is, sick LeaveCarriedOver in 2010 becomes "opening" balance for sick leave in 2011 plus the one for 2011.
I have done another query on the shown result list that looks like:
var leaveDetails1 = (from l in leaveDetails
select new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = leaveDetails.Where(x => x.LeaveType == l.LeaveType).Sum(x => x.LeaveCarriedOver)
});
where leaveDetails is the result from the image.
The resulting RunningTotal is not cumulative as expected. How can I achieve my initial goal. Open to any ideas - my last option will be to do it in javascript in the front-end. Thanks in advance
The simple implementation is to get the list of possible totals first then get the sum from the details for each of these categories.
getting the distinct list of Year and LeaveType is a group by and select first of each group. we return a List<Tuple<int, string>> where Int is the year and string is the LeaveType
var distinctList = leaveDetails1.GroupBy(data => new Tuple<int, string>(data.Year, data.LeaveType)).Select(data => data.FirstOrDefault()).ToList();
then we want total for each of these elements so you want a select of that list to return the id (Year and LeaveType) plus the total so an extra value to the Tuple<int, string, int>.
var totals = distinctList.Select(data => new Tuple<int, string, int>(data.Year, data.LeaveType, leaveDetails1.Where(detail => detail.Year == data.Year && detail.LeaveType == data.LeaveType).Sum(detail => detail.LeaveCarriedOver))).ToList();
reading the line above you can see it take the distinct totals we want to list, create a new object, store the Year and LeaveType for reference then set the last Int with the Sum<> of the filtered details for that Year and LeaveType.
If I completely understand what you are trying to do then I don't think I would rely on the built in LINQ operators exclusively. I think (emphasis on think) that any combination of the built in LINQ operators is going to solve this problem in O(n^2) run-time.
If I were going to implement this in LINQ then I would create an extension method for IEnumerable that is similar to the Scan function in reactive extensions (or find a library out there that has already implemented it):
public static class EnumerableExtensions
{
public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> accumulator)
{
// Validation omitted for clarity.
foreach(TSource value in source)
{
seed = accumulator.Invoke(seed, value);
yield return seed;
}
}
}
Then this should do it around O(n log n) (because of the order by operations):
leaveDetails
.OrderBy(x => x.LeaveType)
.ThenBy(x => x.Year)
.Scan(new {
Year = 0,
LeaveType = "Seed",
LeaveTaken = 0,
LeaveAllocation = 0.0,
LeaveCarriedOver = 0.0,
RunningTotal = 0.0
},
(acc, x) => new {
x.Year,
x.LeaveType,
x.LeaveTaken,
x.LeaveAllocation,
x.LeaveCarriedOver,
RunningTotal = x.LeaveCarriedOver + (acc.LeaveType != x.LeaveType ? 0 : acc.RunningTotal)
});
You don't say, but I assume the data is coming from a database; if that is the case then you could get leaveDetails back already sorted and skip the sorting here. That would get you down to O(n).
If you don't want to create an extension method (or go find one) then this will achieve the same thing (just in an uglier way).
var temp = new
{
Year = 0,
LeaveType = "Who Cares",
LeaveTaken = 3,
LeaveAllocation = 0.0,
LeaveCarriedOver = 0.0,
RunningTotal = 0.0
};
var runningTotals = (new[] { temp }).ToList();
runningTotals.RemoveAt(0);
foreach(var l in leaveDetails.OrderBy(x => x.LeaveType).ThenBy(x => x.Year))
{
var s = runningTotals.LastOrDefault();
runningTotals.Add(new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = l.LeaveCarriedOver + (s == null || s.LeaveType != l.LeaveType ? 0 : s.RunningTotal)
});
}
This should also be O(n log n) or O(n) if you can pre-sort leaveDetails.
If I understand the question you want something like
decimal RunningTotal = 0;
var results = leaveDetails
.GroupBy(r=>r.LeaveType)
.Select(r=> new
{
Dummy = RunningTotal = 0 ,
results = r.OrderBy(o=>o.Year)
.Select(l => new
{
l.Year,
l.LeaveType ,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = (RunningTotal = RunningTotal + l.LeaveCarriedOver )
})
})
.SelectMany(a=>a.results).ToList();
This is basically using the Select<TSource, TResult> overload to calculate the running balance, but first grouped by LeaveType so we can reset the RunningTotal for every LeaveType, and then ungrouped at the end.
You have to use Window Function Sum here. Which is not supported by EF Core and earlier versions of EF. So, just write SQL and run it via Dapper
SELECT
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
SUM(l.LeaveCarriedOver) OVER (PARTITION BY l.Year, l.LeaveType) AS RunningTotal
FROM leaveDetails l
Or, if you are using EF Core, use package linq2db.EntityFrameworkCore
var leaveDetails1 = from l in leaveDetails
select new
{
l.Year,
l.LeaveType,
l.LeaveTaken,
l.LeaveAllocation,
l.LeaveCarriedOver,
RunningTotal = Sql.Ext.Sum(l.LeaveCarriedOver).Over().PartitionBy(l.Year, l.LeaveType).ToValue()
};
// switch to alternative LINQ translator
leaveDetails1 = leaveDetails1.ToLinqToDB();
I am comparing 2 lists and I need to collect occurrences of a subset (modulesToDelete) from the master list (allModules) ONLY when MORE than one occurrence is found. (allModules contains modulesToDelete). Multiple occurrences of any module in modulesToDelete means those modules are being shared. One occurrence of a module in modulesToDelete means that module is isolated and is safe to delete (it just found itself). I can do this with nested foreach loops but this is as far as I got with a LINQ expression (which doesn't work)collect:
List<Module> modulesToDelete = { A, B, C, K }
List<string> allModules = {R, A, B, C, K, D, G, T, B, K } // need to flag B and K
var mods = from mod in modulesToDelete
where allModules.Any(name => name.Contains(mod.Name) && mod.Name.Count() > 1)
select mod;
here is my nested foreach loops which I want to replace with a LINQ expression:
foreach (Module mod in modulesToDelete)
{
int count = 0;
foreach (string modInAllMods in allModules)
{
if (modInAllMods == mod.Name)
{
count++;
}
}
if (count > 1)
{
m_moduleMarkedForKeep.Add(mod);
}
else if( count == 1)
{
// Delete the linked modules
}
}
You can use a lookup which is similar to a dictionary but allows multiple equal keys and returns an IEnumerable<T> as value.
var nameLookup = modulesToDelete.ToLookup(m => m.Name);
var safeToDelete = modulesToDelete.Where(m => nameLookup[m.Name].Count() == 1);
var sharedModules = modulesToDelete.Where(m => nameLookup[m.Name].Count() > 1);
Edit: However, i don't see how allModules is related at all.
Probably easier and with the desired result on your sample data:
var mods = modulesToDelete.Where(m => allModules.Count(s => s == m.Name) > 1);
One way of going about solving this will be to use Intersect function,
Intersection of two string array (ignore case)
I have the following LINQ statement:
IEnumerable<Statement> statement = bookmarkCollection.AsEnumerable().Select(
bookmark => new Statement()
{
Title = bookmark.Title,
PageNumber = bookmark.PageNumber
});
Statement has another attribute called NextPageNumber that I need to be able to populate. NextPageNumber is equal to the PageNumber of the next record minus 1. Esentially, something like this:
IEnumerable<Statement> statement = bookmarkCollection.AsEnumerable().Select(
bookmark => new Statement()
{
Title= bookmark.Title,
PageNumber = bookmark.PageNumber,
NextPageNumber = ???
});
UPDATE:
I attempted some of the solutions provided, but I am stil on .NET 3.5 so the Tuple method is out. The Zip operation works (I have extension methods that simulate Zip for 3.5), but it does not create a Statement for the last Bookmark. The NextPageNumber for the last bookmark would simply be the number of pages in the PDF.
FINAL UPDATE:
Many thanks to everyone. With your help, I was able to get this working appropriately.
Here is a helper function that maps a sequence into a sequence of pairs where each pair is each item paired with the one that follows it.
public static IEnumerable<Tuple<T, T>> WithNext<T>(this IEnumerable<T> source)
{
using (var iterator = source.GetEnumerator())
{
if(!iterator.MoveNext())
yield break;
T previous = iterator.Current;
while (iterator.MoveNext())
{
yield return Tuple.Create(previous, iterator.Current);
previous = iterator.Current;
}
yield return Tuple.Create(previous, default(T));
}
}
Now you can do:
var query = bookmarkCollection.AsEnumerable()
.WithNext()
.Select(pair => new Statement(){
Title= pair.Item1.Title,
PageNumber = pair.Item1.PageNumber,
NextPageNumber = pair.Item2.PageNumber - 1, //note you'll need to null check for the last item
});
It's probably better to use a for loop, but you can cobble together something using .Zip if you're really set on linq:
var strings = new[] { "one", "two", "three", "four", "five" };
var result = strings.Zip(
strings.Skip(1).Concat(Enumerable.Repeat("last", 1)),
(a, b) => new { a, b }
);
Result
one two
two three
three four
four five
five last
var bc = bookmarkCollection.AsEnumerable();
IEnumerable<Statement> statement = bc.Zip(bc.Skip(1),
(b1,b2) => new Statement()
{
Title= b1.Title,
PageNumber = b1.PageNumber,
NextPageNumber = b2.PageNumber - 1
});
EDIT: (per comment below):
If you need to include the last item as well, then you'd best use #Servy's helper method.
You could do this...
var bc = bookmarkCollection.AsEnumerable();
IEnumerable<Statement> statement = bc.Zip(bc.Skip(1).Concat(new Bookmark[] { null }),
(b1,b2) => new Statement()
{
Title= b1.Title,
PageNumber = b1.PageNumber,
NextPageNumber = b2 == null ? 0 : b2.PageNumber - 1
});
...however, I originally suggested Zip only because it was quick and easy -- now it's getting a bit harder to interpret. Therefore, I'd suggest you use #Servy's method with a slight modification to include a selector function:
public static IEnumerable<TResult> WithNext<T, TResult>(this IEnumerable<T> source, Func<T, T, TResult> selector)
{
using (var e = source.GetEnumerator())
{
if (!e.MoveNext()) yield break;
T previous = e.Current;
while (e.MoveNext())
{
yield return selector(previous, e.Current);
previous = e.Current;
}
yield return selector(previous, default(T));
}
}
and use it like:
IEnumerable<Statement> statement = bc.WithNext(
(b1, b2) => new Statement()
{
Title = b1.Title,
PageNumber = b1.PageNumber,
NextPageNumber = b2 == null ? 0 : b2.PageNumber - 1
}).ToList();
LINQ doesn't provide an easy way to do this, short of capturing the result with ToList() or ToArray() and iterating over the list to update each successive record, e.g.:
var statements = bookmarkCollection.AsEnumerable().Select(
bookmark => new Statement()
{
Title= bookmark.Title,
PageNumber = bookmark.PageNumber,
}).ToList();
for (var i = 0; i < statements.Length - 1; i++)
statements[i].NextPageNumber = statements[i+1].PageNumber - 1;
In theory you could also use the Select() overload that takes a Func<T, int, R>, but the mechanics would be the same.
You can try use Linq for this problem, try an indexed Select, something like this:
var statement = bookmarkCollection.AsEnumerable().Select(
(bookmark, index) => new Statement()
{
Title = bookmark.Title,
PageNumber = bookmark.PageNumber,
NextPageNumber = index < bookmarkCollection.Count -1 ? bookmarkCollection[index + 1].PageNumber - 1 : -1
});
The code above is setting NextPageNumber to -1 when there is no next record.
A good reference for Linq is 101 LINQ Samples, where you can see other indexed Select sample: http://code.msdn.microsoft.com/101-LINQ-Samples-3fb9811b
I have a
List<string>
with 1500 strings. I am now using the following code to pull out only string that start with the string prefixText.
foreach(string a in <MYLIST>)
{
if(a.StartsWith(prefixText, true, null))
{
newlist.Add(a);
}
}
This is pretty fast, but I'm looking for google fast. Now my question is if I arrange the List in alphabetical order, then compare char by char can I make this faster? Or any other suggestions on making this faster?
Thus 1500 is not really a huge number binary search on sorted list would be enough probably.
Nevertheless most efficient algorithms for prefix search are based on the data structure named Trie or Prefix Tree. See: http://en.wikipedia.org/wiki/Trie
Following picture demonstrates the idea very briefly:
For c# implementation see for instance .NET DATA STRUCTURES FOR PREFIX STRING SEARCH AND SUBSTRING (INFIX) SEARCH TO IMPLEMENT AUTO-COMPLETION AND INTELLI-SENSE
You can use PLINQ (Parallel LINQ) to make the execution faster:
var newList = list.AsParallel().Where(x => x.StartsWith(prefixText)).ToList()
If you have the list in alpabetical order, you can use a variation of binary search to make it a lot faster.
As a starting point, this will return the index of one of the strings that match the prefix, so then you can look forward and backward in the list to find the rest:
public static int BinarySearchStartsWith(List<string> words, string prefix, int min, int max) {
while (max >= min) {
int mid = (min + max) / 2;
int comp = String.Compare(words[mid].Substring(0, prefix.Length), prefix);
if (comp < 0) {
min = mid + 1;
} else if (comp > 0) {
max = mid - 1;
} else {
return mid;
}
}
return -1;
}
int index = BinarySearchStartsWith(theList, "pre", 0, theList.Count - 1);
if (index == -1) {
// not found
} else{
// found
}
Note: If you use a prefix that is longer than any of the strings that are compared, it will break, so you might need to figure out how you want to handle that.
So many approches were analyzed to achive minimum data capacity and high performance. The first place is: all prefixes are stored in dictionary: key - prefix, values - items appropriate for prefix.
Here simple implementation of this algorithm:
public class Trie<TItem>
{
#region Constructors
public Trie(
IEnumerable<TItem> items,
Func<TItem, string> keySelector,
IComparer<TItem> comparer)
{
this.KeySelector = keySelector;
this.Comparer = comparer;
this.Items = (from item in items
from i in Enumerable.Range(1, this.KeySelector(item).Length)
let key = this.KeySelector(item).Substring(0, i)
group item by key)
.ToDictionary( group => group.Key, group => group.ToList());
}
#endregion
#region Properties
protected Dictionary<string, List<TItem>> Items { get; set; }
protected Func<TItem, string> KeySelector { get; set; }
protected IComparer<TItem> Comparer { get; set; }
#endregion
#region Methods
public List<TItem> Retrieve(string prefix)
{
return this.Items.ContainsKey(prefix)
? this.Items[prefix]
: new List<TItem>();
}
public void Add(TItem item)
{
var keys = (from i in Enumerable.Range(1, this.KeySelector(item).Length)
let key = this.KeySelector(item).Substring(0, i)
select key).ToList();
keys.ForEach(key =>
{
if (!this.Items.ContainsKey(key))
{
this.Items.Add(key, new List<TItem> { item });
}
else if (this.Items[key].All(x => this.Comparer.Compare(x, item) != 0))
{
this.Items[key].Add(item);
}
});
}
public void Remove(TItem item)
{
this.Items.Keys.ToList().ForEach(key =>
{
if (this.Items[key].Any(x => this.Comparer.Compare(x, item) == 0))
{
this.Items[key].RemoveAll(x => this.Comparer.Compare(x, item) == 0);
if (this.Items[key].Count == 0)
{
this.Items.Remove(key);
}
}
});
}
#endregion
}
1500 is usually too few:
you could search it in parallel with a simple divide and conquer of the problem. Search each half of the list in two (or divide into three, four, ..., parts) different jobs/threads.
Or store the strings in a (not binary) tree instead. Will be O(log n).
sorted in alphabetical order you can do a binary search (sort of the same as the previous one)
You can accelerate a bit by comparing the first character before invoking StartsWith:
char first = prefixText[0];
foreach(string a in <MYLIST>)
{
if (a[0]==first)
{
if(a.StartsWith(prefixText, true, null))
{
newlist.Add(a);
}
}
}
I assume that the really fastest way would be to generate a dictionary with all possible prefixes from your 1500 strings, effectively precomputing the results for all possible searches that will return non-empty. Your search would then be simply a dictionary lookup completing in O(1) time. This is a case of trading memory (and initialization time) for speed.
private IDictionary<string, string[]> prefixedStrings;
public void Construct(IEnumerable<string> strings)
{
this.prefixedStrings =
(
from s in strings
from i in Enumerable.Range(1, s.Length)
let p = s.Substring(0, i)
group s by p
).ToDictionary(
g => g.Key,
g => g.ToArray());
}
public string[] Search(string prefix)
{
string[] result;
if (this.prefixedStrings.TryGetValue(prefix, out result))
return result;
return new string[0];
}
Have you tried implementing a Dictionary and comparing the results? Or, if you do put the entries in alphabetical order, try a binary search.
The question to me is whether or not you'll need to do this one time or multiple times.
If you only find the StartsWithPrefix list one time, you can't get faster then leaving the original list as is and doing myList.Where(s => s.StartsWith(prefix)). This looks at every string one time so it's O(n)
If you need to find the StartsWithPrefix list several times, or maybe you're going to want to add or remove strings to the original list and update the StartsWithPrefix list then you should sort the original list and use binary search. But this will be sort time + search time = O(n log n) + 2 * O(log n)
If you did the binary search method, you would find the indexes of the first occurrence of your prefix and the last occurrence via search. Then do mySortedList.Skip(n).Take(m-n) where n is first index and m is last index.
Edit:
Wait a minute, we're using the wrong tool for the job. Use a Trie! If you put all your strings into a Trie instead of the list, all you have to do is walk down the trie with your prefix and grab all the words underneath that node.
I would go with using Linq:
var query = list.Where(w => w.StartsWith("prefixText")).Select(s => s).ToList();