Does Lists in C# support slicing like in Python? - c#

Sorry for such a basic question regarding lists, but do we have this feature in C#?
e.g. imagine this Python List:
a = ['a','b,'c']
print a[0:1]
>>>>['a','b']
Is there something like this in C#? I currently have the necessity to test some object properties in pairs. edit: pairs are always of two :P
Imagine a larger (python) list:
a = ['a','a','b','c','d','d']
I need to test for example if a[0] = a[1], and if a[1] = a[2] etc.
How this can be done in C#?
Oh, and a last question: what is the tag (here) i can use to mark some parts of my post as code?

You can use LINQ to create a lazily-evaluated copy of a segment of a list. What you can't do without extra code (as far as I'm aware) is take a "view" on an arbitrary IList<T>. There's no particular reason why this shouldn't be feasible, however. You'd probably want it to be a fixed size (i.e. prohibit changes via Add/Remove) and you could also make it optionally read-only - basically you'd just proxy various calls on to the original list.
Sounds like it might be quite useful, and pretty easy to code... let me know if you'd like me to do this.
Out of interest, does a Python slice genuinely represent a view, or does it take a copy? If you change the contents of the original list later, does that change the contents of the slice? If you really want a copy, the the LINQ solutions using Skip/Take/ToList are absolutely fine. I do like the idea of a cheap view onto a collection though...

I've been looking for something like Python-Slicing in C# with no luck.
I finally wrote the following string extensions to mimic the python slicing:
static class StringExtensions
{
public static string Slice(this string input, string option)
{
var opts = option.Trim().Split(':').Select(s => s.Length > 0 ? (int?)int.Parse(s) : null).ToArray();
if (opts.Length == 1)
return input[opts[0].Value].ToString(); // only one index
if (opts.Length == 2)
return Slice(input, opts[0], opts[1], 1); // start and end
if (opts.Length == 3)
return Slice(input, opts[0], opts[1], opts[2]); // start, end and step
throw new NotImplementedException();
}
public static string Slice(this string input, int? start, int? end, int? step)
{
int len = input.Length;
if (!step.HasValue)
step = 1;
if (!start.HasValue)
start = (step.Value > 0) ? 0 : len-1;
else if (start < 0)
start += len;
if (!end.HasValue)
end = (step.Value > 0) ? len : -1;
else if (end < 0)
end += len;
string s = "";
if (step < 0)
for (int i = start.Value; i > end.Value && i >= 0; i+=step.Value)
s += input[i];
else
for (int i = start.Value; i < end.Value && i < len; i+=step.Value)
s += input[i];
return s;
}
}
Examples of how to use it:
"Hello".Slice("::-1"); // returns "olleH"
"Hello".Slice("2:-1"); // returns "ll"

Related

C# Finding the index of a number in an array using recursion

So in the programming course I'm taking we learned about recursion. I got an assignment to write recursive function that gets sorted array and a number and return the index of that number in the array if existed.
I didn't quite understand yet the subject of recursion so I need a little help with my code. I think i'm at the right direction but again, I'm a little struggling with the subject so I thought I could find guidance and help here.
this is the code I have at the moment:
private static int arrayIndexValue(int[] arr, int ind)
{
if (ind > arr.Length/2)
{
return arrayIndexValue(arr.Length/2, ind)
}
else if (ind < arr.Length/2)
{
return arrayIndexValue(arr.Length/2)
}
}
basically what i wanted to write here is something like this:
if the number the user inserts is smaller then the middle of the array, continue with the function but with the array cut in half (Binary search)
same if the number is bigger (i suggested to use my function with something like the binary search but as you can see i dont quite know how to apply it to my code)
Recursion works by breaking the entire problem into smaller parts. If you are at the smallest part (meaning that your array has only one value) then you would have your solution. This would also be your first case that you have to handle.
if (arr.Length == 1) return arr[0];
Now you can start to break down the problem. Binary left or right decision. As you already wrote you want to check weather the number is left or right from the middle of your array. So in your condition you need to access the element in the array using the [ ] operator:
if (ind > arr[arr.Length / 2]) // check if larger than the middle element
To extract a part of the array you need a second array in which you can copy the content that you want. Since you intend to pass only half of the current array into the recursive call you also need only half of the size.
int [] temp = new int[arr.Length / 2];
Now you can copy the part that you desire (first or second) depending on your condition and keep searching with a recursive call.
To copy the Array you can use the Arra.Copy method. You can pass the start and length to this method and it will copy the left or the right part into the int [] temp array. Here is an example of how to copy the right part:
Array.Copy(arr, arr.Length / 2, temp, 0, arr.Length / 2);
In the end you will need to fire your recursive call. This call will expect the new array and still the same number to look for. So since you copied either the left or right half of the previous array, now you pass the new short array to the recursive call:
return arrayIndexValue(temp, ind);
And there you have it
i got an assignment to write recursive function that gets sorted array
and a number and return the index of that number in the array if
existed.
In your method you check indexes, not values - you should compare values of elements, not indexes.
To write recursive function to find element you should pass array, element to find, start index and end index. Start index and end index will be used for finding in part of array.
Your method will be something like this:
private static int GetIndex(int[] arr, int element, int startIndex, int endIndex)
{
if (startIndex > endIndex)
{
return -1; //not found
}
var middleIndex = (startIndex + endIndex) / 2;
if (element == arr[middleIndex])
{
return middleIndex;
}
if (element < arr[middleIndex])
{
return GetIndex(arr, element, startIndex, middleIndex - 1);
}
else
{
return GetIndex(arr, element, middleIndex + 1, endIndex);
}
}
and to get some index:
static void Main(String[] args)
{
var arr = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
var element = 5;
var index = GetIndex(arr, element , 0, arr.Length - 1);
}
So the problem you have right now is that you haven't broken your problem into repeating steps. It actually should be very similar flow to a loop without all the variables. I want to do something a little simpler first.
void int find(int[] values, int at, int value) {
if (values.Length >= at || at < 0) {
return -1;
} else if (values[at] == value) {
return at;
}
return find(values, at+1, value);
}
This function just moves a index up one every time via recursion, but we can also do something very similar with a loop.
void int find(int[] values, int at, int value) {
while (values.Length < at && at >= 0) {
if (values[at] == value) {
return at;
}
at = at + 1;
}
return -1;
}
The whole reason we usually talk about recursion is that it is usually used so that you have no "mutating" state. The variables and there values are the same for every value. You can break this, but it is usually considered beneficial. A quick stab at your problem produces something like
void int find(int[] sortedValues, int start, int end, int value) {
var midIndex = (start + end) / 2;
var mid = sortedValues[midIndex];
if (start == end) {
return mid == value ? midIndex : -1;
} else if (mid > value) {
return find(sortedValues, mid, end, value);
} else if (mid < value) {
return find(sortedValues, start, mid, value);
} else {
return midIndex;
}
}
However, this isn't perfect. It has known issues like edge cases that would cause a crash. It should definitely be cleaned up a little. If you want to really want to dip your toe into recursion, try a purely functional language where you cannot mutate, like Haskell, Elm, or maybe something a little impure like F#, Clojure, or Scala. Anyway have fun.

Removing masked entries from an array

The task is to keep an array of objects untouched if input is null and, otherwise, remove the elements that are on positions specified by the input. I've got it working but I'm vastly dissatisfied with the code quality.
List<Stuff> stuff = new List<Stuff>{ new Stuff(1), new Stuff(2), new Stuff(3) };
String input = "5";
if(input == null)
return stuff;
int mask = Int32.Parse(input);
for (int i = stuff.Count - 1; i >= 0; i--)
if ((mask & (int)Math.Pow(2, i)) == 0)
stuff.RemoveAt(i);
return stuff;
The actual obtaining input and the fact that e.g. String.Empty will cause problems need not to be regarded. Let's assume that those are handled.
How can I make the code more efficient?
How can I make the syntax more compact and graspable?
Instead of the backwards running loop, you could use Linq with the following statement.
stuff = stuff.Where( (iStuff, idx) => (mask & (int)Math.Pow(2, idx)) != 0 );
Or even cooler using bitwise shit.
stuff = stuff.Where((_, index) => (mask >> index & 1) == 1);
It uses an overload of Where which can access the position in the sequence, as documented here. For a similar task, there is also an overload of Select which gives access to the index, as documented here.
Untested, but you could make an extension method that iterates the collection and filters, returning matching elements as it goes. Repeatedly bit-shifting the mask and checking the 0th bit seems the easiest to follow - for me at least.
static IEnumerable<T> TakeMaskedItemsByIndex(this IEnumerable<T> collection, ulong mask)
{
foreach (T item in collection)
{
if((mask & 1) == 1)
yield return item;
mask = mask >> 1;
}
}

Parsing string and generating all possible combinations in multiple strings

In my SQL Server database I have strings stored representing the correct solution to a question. In this string a certain format can be used to represent multiple correct solutions. The format:
possible-text [posibility-1/posibility-2] possible-text
This states either posibility-1 or posibility-2 is correct. There is no limit on how many possibilities there are (e.g. [ pos-1 / pos-2 / pos-3 / ... ] is possible).
However, a possibility can be null, e.g.:
I am [un/]certain.
This means the answer could be "I am certain" or "I am uncertain".
The format can also be nested in a sentence, e.g.:
I am [[un/]certain/[un/]sure].
The format can also occur multiple times in one sentence, e.g.:
[I am/I'm] [[un/]certain/[/un]sure].
What I want is to generate all the possible combinations. E.g. the above expression should return:
I am uncertain.
I am certain.
I am sure.
I am unsure.
I'm uncertain.
I'm certain.
I'm sure.
I'm unsure.
There is no limit on the nesting, nor the amount of possibilities. If there is only one possible solution then it will have not be in the above format. I'm not sure on how to do this.
I have to write this in C#. I think a possible solution could be to write a regex expression that can capture the [ / ] format and return me the possible solutions in a list (for every []-pair) and then generating the possible solutions by going over them in a stack-style way (some sort of recursion and backtracking style), but I'm not to a working solution yet.
I'm at a loss on to how exactly start on this. If somebody could give me some pointers on how to tackle this problem I'd appreciate it. When I find something I'll add it here.
Note: I noticed there are a lot of similar questions, however the solutions all seem to be specific to the particular problem and I think not applicable to my problem. If however I'm wrong, and you remember a previously answered question that can solve this, could you then tell me? Thanks in advance.
Update: Just to clarify if it was unclear. Every line in code is possible input. So this whole line is input:
[I am/I'm] [[un/]certain/[/un]sure].
This should work. I didn't bother optimizing it or doing error checking (in case the input string is malformed).
class Program
{
static IEnumerable<string> Parts(string input, out int i)
{
var list = new List<string>();
int level = 1, start = 1;
i = 1;
for (; i < input.Length && level > 0; i++)
{
if (input[i] == '[')
level++;
else if (input[i] == ']')
level--;
if (input[i] == '/' && level == 1 || input[i] == ']' && level == 0)
{
if (start == i)
list.Add(string.Empty);
else
list.Add(input.Substring(start, i - start));
start = i + 1;
}
}
return list;
}
static IEnumerable<string> Combinations(string input, string current = "")
{
if (input == string.Empty)
{
if (current.Contains('['))
return Combinations(current, string.Empty);
return new List<string> { current };
}
else if (input[0] == '[')
{
int end;
var parts = Parts(input, out end);
return parts.SelectMany(x => Combinations(input.Substring(end, input.Length - end), current + x)).ToList();
}
else
return Combinations(input.Substring(1, input.Length - 1), current + input[0]);
}
static void Main(string[] args)
{
string s = "[I am/I'm] [[un/]certain/[/un]sure].";
var list = Combinations(s);
}
}
You should create a parser that read character by character and builds up a logical tree of the sentence. When you have the tree it is easy to generate all possible combinations. There are several lexical parsers available that you could use, for example ANTLR: http://programming-pages.com/2012/06/28/antlr-with-c-a-simple-grammar/

Interpolation in c# - performance problem

I need to resample big sets of data (few hundred spectra, each containing few thousand points) using simple linear interpolation.
I have created interpolation method in C# but it seems to be really slow for huge datasets.
How can I improve the performance of this code?
public static List<double> interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
double[] interpolated = new double[breaks.Count];
int id = 1;
int x = 0;
while(breaks[x] < xItems[0])
{
interpolated[x] = yItems[0];
x++;
}
double p, w;
// left border case - uphold the value
for (int i = x; i < breaks.Count; i++)
{
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
if (id <= xItems.Count - 1)
{
if (id == xItems.Count - 1 && breaks[i] > xItems[id])
{
interpolated[i] = yItems[yItems.Count - 1];
}
else
{
w = xItems[id] - xItems[id - 1];
p = (breaks[i] - xItems[id - 1]) / w;
interpolated[i] = yItems[id - 1] + p * (yItems[id] - yItems[id - 1]);
}
}
else // right border case - uphold the value
{
interpolated[i] = yItems[yItems.Count - 1];
}
}
return interpolated.ToList();
}
Edit
Thanks, guys, for all your responses. What I wanted to achieve, when I wrote this questions, were some general ideas where I could find some areas to improve the performance. I haven't expected any ready solutions, only some ideas. And you gave me what I wanted, thanks!
Before writing this question I thought about rewriting this code in C++ but after reading comments to Will's asnwer it seems that the gain can be less than I expected.
Also, the code is so simple, that there are no mighty code-tricks to use here. Thanks to Petar for his attempt to optimize the code
It seems that all reduces the problem to finding good profiler and checking every line and soubroutine and trying to optimize that.
Thank you again for all responses and taking your part in this discussion!
public static List<double> Interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
var a = xItems.ToArray();
var b = yItems.ToArray();
var aLimit = a.Length - 1;
var bLimit = b.Length - 1;
var interpolated = new double[breaks.Count];
var total = 0;
var initialValue = a[0];
while (breaks[total] < initialValue)
{
total++;
}
Array.Copy(b, 0, interpolated, 0, total);
int id = 1;
for (int i = total; i < breaks.Count; i++)
{
var breakValue = breaks[i];
while (breakValue > a[id])
{
id++;
if (id > aLimit)
{
id = aLimit;
break;
}
}
double value = b[bLimit];
if (id <= aLimit)
{
var currentValue = a[id];
var previousValue = a[id - 1];
if (id != aLimit || breakValue <= currentValue)
{
var w = currentValue - previousValue;
var p = (breakValue - previousValue) / w;
value = b[id - 1] + p * (b[id] - b[id - 1]);
}
}
interpolated[i] = value;
}
return interpolated.ToList();
}
I've cached some (const) values and used Array.Copy, but I think these are micro optimization that are already made by the compiler in Release mode. However You can try this version and see if it will beat the original version of the code.
Instead of
interpolated.ToList()
which copies the whole array, you compute the interpolated values directly in the final list (or return that array instead). Especially if the array/List is big enough to qualify for the large object heap.
Unlike the ordinary heap, the LOH is not compacted by the GC, which means that short lived large objects are far more harmful than small ones.
Then again: 7000 doubles are approx. 56'000 bytes which is below the large object threshold of 85'000 bytes (1).
Looks to me you've created an O(n^2) algorithm. You are searching for the interval, that's O(n), then probably apply it n times. You'll get a quick and cheap speed-up by taking advantage of the fact that the items are already ordered in the list. Use BinarySearch(), that's O(log(n)).
If still necessary, you should be able to do something speedier with the outer loop, what ever interval you found previously should make it easier to find the next one. But that code isn't in your snippet.
I'd say profile the code and see where it spends its time, then you have somewhere to focus on.
ANTS is popular, but Equatec is free I think.
few suggestions,
as others suggested, use profiler to understand better where time is used.
the loop
while (breaks[x] < xItems[0])
could cause exception if x grows bigger than number of items in "breaks" list. You should use something like
while (x < breaks.Count && breaks[x] < xItems[0])
But you might not need that loop at all. Why treat the first item as special case, just start with id=0 and handle the first point in for(i) loop. I understand that id might start from 0 in this case, and [id-1] would be negative index, but see if you can do something there.
If you want to optimize for speed then you sacrifice memory size, and vice versa. You cannot usually have both, except if you make really clever algorithm. In this case, it would mean to calculate as much as you can outside loops, store those values in variables (extra memory) and use them later. For example, instead of always saying:
id = xItems.Count - 1;
You could say:
int lastXItemsIndex = xItems.Count-1;
...
id = lastXItemsIndex;
This is the same suggestion as Petar Petrov did with aLimit, bLimit....
next point, your loop (or the one Petar Petrov suggested):
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
could probably be reduced to:
double currentBreak = breaks[i];
while (id <= lastXIndex && currentBreak > xItems[id]) id++;
and the last point I would add is to check if there is some property in your samples that is special for your problem. For example if xItems represent time, and you are sampling in regular intervals, then
w = xItems[id] - xItems[id - 1];
is constant, and you do not have to calculate it every time in the loop.
This is probably not often the case, but maybe your problem has some other property which you could use to improve performance.
Another idea is this: maybe you do not need double precision, "float" is probably faster because it is smaller.
Good luck
System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
I hope it's release build without DEBUG defined?
Otherwise, it might depend on what exactly are those IList parameters. May be useful to store Count value instead of accessing property every time.
This is the kind of problem where you need to move over to native code.

What is the most efficient (read time) string search method? (C#)

I find that my program is searching through lots of lengthy strings (20,000+) trying to find a particular unique phrase.
What is the most efficent method for doing this in C#?
Below is the current code which works like this:
The search begins at startPos because the target area is somewhat removed from the start
It loops through the string, at each step it checks if the substring from that point starts with the startMatchString, which is an indicator that the start of the target string has been found. (The length of the target string varys).
From here it creates a new substring (chopping off the 11 characters that mark the start of the target string) and searches for the endMatchString
I already know that this is a horribly complex and possibly very inefficent algorithm.
What is a better way to accomplish the same result?
string result = string.Empty;
for (int i = startPos; i <= response.Length - 1; i++)
{
if (response.Substring(i).StartsWith(startMatchString))
{
string result = response.Substring(i).Substring(11);
for (int j = 0; j <= result.Length - 1; j++)
{
if (result.Substring(j).StartsWith(endMatchString))
{
return result.Remove(j)
}
}
}
}
return result;
You can use String.IndexOf, but make sure you use StringComparison.Ordinal or it may be one order of magnitude slower.
private string Search2(int startPos, string startMatchString, string endMatchString, string response) {
int startMarch = response.IndexOf(startMatchString, startPos, StringComparison.Ordinal);
if (startMarch != -1) {
startMarch += startMatchString.Length;
int endMatch = response.IndexOf(endMatchString, startMarch, StringComparison.Ordinal);
if (endMatch != -1) { return response.Substring(startMarch, endMatch - startMarch); }
}
return string.Empty;
}
Searching 1000 times a string at about the 40% of a 183 KB file took about 270 milliseconds. Without StringComparison.Ordinal it took about 2000 milliseconds.
Searching 1 time with your method took over 60 seconds as it creates a new string (O(n)) each iteration, making your method O(n^2).
There are a whole bunch of algorithms,
boyer and moore
Sunday
Knuth-Morris-Pratt
Rabin-Karp
I would recommend to use the simplified Boyer-Moore, called Boyer–Moore–Horspool.
The C-code appears at the wikipedia.
For the java code look at
http://www.fmi.uni-sofia.bg/fmi/logic/vboutchkova/sources/BoyerMoore_java.html
A nice article about these is available under
http://www.ibm.com/developerworks/java/library/j-text-searching.html
If you want to use built-in stuff go for regular expressions.
It depends on what you're trying to find in the string. If you're looking for a specific sequence IndexOf/Contains are fast, but if you're looking for wild card patterns Regex is optimized for this kind of search.
I would try to use a Regular Expression instead of rolling my own string search algorithm. You can precompile the regular expression to make it run faster.
For very long strings you cannot beat the boyer-moore search algorithm. It is more complex than I might try to explain here, but The CodeProject site has a pretty good article on it.
You could use a regex; it’s optimized for this kind of searching and manipulation.
You could also try IndexOf ...
string result = string.Empty;
if (startPos >= response.Length)
return result;
int startingIndex = response.IndexOf(startMatchString, startPos);
int rightOfStartIndex = startingIndex + startMatchString.Length;
if (startingIndex > -1 && rightOfStartIndex < response.Length)
{
int endingIndex = response.IndexOf(endMatchString, rightOfStartIndex);
if (endingIndex > -1)
result = response.Substring(rightOfStartIndex, endingIndex - rightOfStartIndex);
}
return result;
Here's an example using IndexOf (beware: written from the top of my head, didn't test it):
int skip = 11;
int start = response.IndexOf(startMatchString, startPos);
if (start >= 0)
{
int end = response.IndexOf(startMatchString, start + skip);
if (end >= 0)
return response.Substring(start + skip, end - start - skip);
else
return response.Substring(start + skip);
}
return string.Empty;
As said before regex is your friend.
You might want to look at RegularExpressions.Group.
This way you can name part of the matched resultset.
Here is an example

Categories