string.split for alternating characters in c# - c#

I know this is going to be an easy answer but I am new to C#. There is a lot of answers on for different types of splits, but I cannot find one for this problem. I am trying to split a string into two by alternating characters.
The example would be: string example = "stackoverflow" and the output would be "sakvrlo" and "tcoefo".
If it is an odd number of characters it does not matter if the two new strings are of different lengths.
Thanks!

StringBuilder sb1 = new StringBuilder();
StringBuilder sb2 = new StringBuilder();
string source = "some string to split";
// ALTERNATE: if you want an explicitly typed char[]
// char[] source = "some string to split".ToCharArray();
for( int i = 0; i < source.Length; i++ )
{
if( i % 2 == 0 )
{
sb1.Append( source[i] );
}
else
{
sb2.Append( source[i] );
}
}
Notes:
There is no BCL method that I am aware of that does this automatically (e.g. string.SplitAlternating().
Since the size of the string is known, the StringBuilders can be initialized with a fixed buffer size.
LINQ solution (#usr's answer) is cleaner but slower (if it matters, which is probably rarely).
If performance truly does matter, the fastest way is probably to obtain a pointer to the start of the original char array and iterate by incrementing two separate pointers created by two char arrays declared using stacalloc. Subsequently, those two char arrays can then be passed as an argument to string's constructor. Make sure to append null terminators.

var a = new string(source.Where((c,i) => i % 2 == 0).ToArray());
var b = new string(source.Where((c,i) => i % 2 != 0).ToArray());

Related

Efficient Replace Characters in a string from one array for another

The specific problem I have is that I have to replace the numbers in chemical formulae with the equivalent Unicode subscripts, so H2SO4 => H₂SO₄. (Those subscripts are not font adjustments, they are special unicode characters.)
So my initial cut was:
return unit.Replace("2", "₂").
Replace("3", "₃").
Replace("4", "₄").
Replace("5", "₅").
Replace("6", "₆").
Replace("7", "₇");
Which works, but obviously isn't particularly efficient. Any suggestions for a more optimal algorithm?
There are only 10 possible subscript characters that need replacement and most chemical formulas are not too long. For this reason, I think your implementation is not horribly inefficient and I would suggest benchmarking your code before trying to optimize it.
But here's my attempt to create a method that does what you need:
public string ToSubscriptFormula(string input)
{
var characters = input.ToCharArray();
for (var i = 0; i < characters.Length; i++)
{
switch (characters[i])
{
case '2':
characters[i] = '₂';
break;
case '3':
characters[i] = '₃';
break;
// case statements omitted
}
}
return new string(characters);
}
I would recommend avoiding the use of StringBuilder unless you're appending a large amount of strings, as the overhead of creating an instance would actually make your code less efficient. See this post by Jon Skeet for a detailed explanation of when it should be used.
Also, given the limited number of case statements, I personally don't think using a Dictionary<char,char> would add any readability or performance benefit, but under different scenarios it might be useful to consider using one.
But if you really had to super-optimize your method, you could replace the case statement with the following code (thanks to andrew for the suggestion):
public string ToSubscriptFormula(string input)
{
var characters = input.ToCharArray();
const int distance = '₀' - '0'; // distance of subscript from digit
for (var i = 0; i < characters.Length; i++)
{
if(char.IsDigit(characters[i]))
{
characters[i] = (char) (characters[i] + distance);
}
}
return new string(characters);
}
The trick here is that all subscript characters are successive and that casting an int to char will give you the corresponding character.
Finally, as #nwellnhof has suggested in the comments, char.IsDigit() would return true for some non-latin digit characters in the Unicode Nd Category.
If your chemical formula contains such characters, the statement should be replaced with c >= '0' && c<='9'. This will probably be slightly faster than char.IsDigit but I'm not sure if it would make a difference in most practical scenarios.
I would be tempted to do something like this:
public string replace(string input)
{
StringBuilder sb = new StringBuilder();
Dictionary<char, char> map = new Dictionary<char, char>();
map.Add('2', '₂');
map.Add('3', '₃');
map.Add('4', '₄');
map.Add('5', '₅');
map.Add('6', '₆');
map.Add('7', '₇');
char tmp;
foreach(char c in input)
{
if (map.TryGetValue(c, out tmp))
sb.Append(tmp);
else
sb.Append(c);
}
return sb.ToString();
}
The Dictionary is defined inside the method here for simplicity, but should be defined somewhere else in scope.
So, very simply, iterate the input string only once. For every character, find the matching Dictionary entry if it exists, and append either that or the original character to a StringBuilder in order to avoid creating multiple string objects.
My first thought was what about formulae with balancing prefix numbers:
E.g. 2H₂(g) + O₂(g) → 2H₂O(g)
Presumably you don't want this to replace the leading numbers?
Also, I'm not sure why it is mentioned above that only 8 digits (or even only 6 digits) need replacement - aren't all digits required (0-9)? Sure, you don't have 0 and 1 by themselves, but you need them for, e.g., 10.
Anyway, notwithstanding the above (which I didn't attempt to implement since it wasn't the question), avoiding StringBuilder and operating on a char array seemed to make sense, and I preferred to avoid a large switch statement.
public class Program
{
public static void Main()
{
Console.WriteLine(SubscriptNums("C6H12O6"));
}
public static string SubscriptNums(string input)
{
char[] replacementChars = { '₀', '₁', '₂', '₃', '₄', '₅', '₆', '₇', '₈', '₉' };
int zeroCharIndex = (int)'0';
char[] inputCharArray = input.ToCharArray();
for(int i = 0; i < inputCharArray.Length; i++)
{
if (inputCharArray[i] >= '0' && inputCharArray[i] <= '9')
{
inputCharArray[i] = replacementChars[(int)inputCharArray[i] - zeroCharIndex];
}
}
return new string(inputCharArray);
}
}
Edit 1 - removed magic number for numeric value of '0'.
Edit 2 - removed use of IsDigit.
You could iterate over the string and check each char. If it is to replace, append the according character to the StringBuilder. If not, just add the original character. This way, you only have to iterate over the string once, and not once for each replacement. Furthermore, as strings are immutable, each call of String.Replace() will create a new copy of the string for the result, which will immediately be GC'ed again.
StringBuilder sb = new StringBuilder();
for (int i = 0; i < unit.Length; i++) {
switch(unit[i]) {
case '2': sb.Append('₂'); break;
case '3': sb.Append('₃'); break;
...
default: sb.Append(unit[i]); break;
}
}
output = sb.ToString();
You could also introduce some replacement dictionary, like Abdullah Nehir suggested
StringBuilder sb = new StringBuilder();
Dictionary<char, char> replacements = new Dictionary<char, char>();
//put in the pairs
for (int i = 0; i < unit.Length; i++) {
if (replacements.ContainsKey(unit[i]))
sb.Append(replacement[unit[i]];
else
sb.Append(unit[i]);
}
Instead of accessing the values via index, you can also iterate the string with a foreach loop
foreach (char c in unit) {
if (replacements.ContainsKey(c))
sb.Append(replacements[c]);
else
sb.Append(c);
}
If you were looking for some elegant code where you don't have to type string.Replace for each character, then this would help you:
public static string Replace(string input)
{
char[] inputCharArr = input.ToCharArray();
StringBuilder sb = new StringBuilder();
foreach (var c in inputCharArr)
{
int intC = (int)c;
//If the digit was a number ([0-9] are [48-57] in unicode),
//replace the old char with the new char
//(8272 when added to the unicode of [0-9] gives the desired result)
if (intC > 47 && intC < 58)
sb.Append((char)(intC + 8272));
else sb.Append(c);
}
return sb.ToString();
}
See the edit history if you wonder what the comments are talking about.

How to store every 2 characters in c#

Is there a way to store every 2 characters in a string?
e.g.
1+2-3-2-3+
So it would be "1+", "2-", "3-", "2-", "3+" as separate strings or in an array.
The simplest way would be to walk your string with a loop, and take two-character substrings from the current position:
var res = new List<string>();
for (int i = 0 ; i < str.Length ; i += 2)
res.Add(str.Substring(i, 2));
An advanced solution would do the same thing with LINQ, and avoid an explicit loop:
var res = Enumerable
.Range(0, str.Length/2)
.Select(i => str.Substring(2*i, 2))
.ToList();
The second solution is somewhat more compact, but it is harder to understand, at least to someone not closely familiar with LINQ.
This is a good problem for a regular expressio. You could try:
\d[+-]
Just find how to compile that regular expression (HINT) and call a method that returns all occurrences.
Use a for loop, and extract the characters using the string.Substring() method, ensuring you do not go over the length of the string.
e.g.
string x = "1+2-3-2-3+";
const int LENGTH_OF_SPLIT = 2;
for(int i = 0; i < x.Length(); i += LENGTH_OF_SPLIT)
{
string temp = null; // temporary storage, that will contain the characters
// if index (i) + the length of the split is less than the
// length of the string, then we will go out of bounds (i.e.
// there is more characters to extract)
if((LENGTH_OF_SPLIT + i) < x.Length())
{
temp = x.Substring(i, LENGTH_OF_SPLIT);
}
// otherwise, we'll break out of the loop
// or just extract the rest of the string, or do something else
else
{
// you can possibly just make temp equal to the rest of the characters
// i.e.
// temp = x.Substring(i);
break; // break out of the loop, since we're over the length of the string
}
// use temp
// e.g.
// Print it out, or put it in a list
// Console.WriteLine(temp);
}

How to split a number into individual digits in c#? [duplicate]

This question already has answers here:
Is there an easy way to turn an int into an array of ints of each digit?
(11 answers)
Closed 7 years ago.
Say I have 12345.
I'd like individual items for each number. A String would do or even an individual number.
Does the .Split method have an overload for this?
I'd use modulus and a loop.
int[] GetIntArray(int num)
{
List<int> listOfInts = new List<int>();
while(num > 0)
{
listOfInts.Add(num % 10);
num = num / 10;
}
listOfInts.Reverse();
return listOfInts.ToArray();
}
Something like this will work, using Linq:
string result = "12345"
var intList = result.Select(digit => int.Parse(digit.ToString()));
This will give you an IEnumerable list of ints.
If you want an IEnumerable of strings:
var intList = result.Select(digit => digit.ToString());
or if you want a List of strings:
var intList = result.ToList();
Well, a string is an IEnumerable and also implements an indexer, so you can iterate through it or reference each character in the string by index.
The fastest way to get what you want is probably the ToCharArray() method of a String:
var myString = "12345";
var charArray = myString.ToCharArray(); //{'1','2','3','4','5'}
You can then convert each Char to a string, or parse them into bytes or integers. Here's a Linq-y way to do that:
byte[] byteArray = myString.ToCharArray().Select(c=>byte.Parse(c.ToString())).ToArray();
A little more performant if you're using ASCII/Unicode strings:
byte[] byteArray = myString.ToCharArray().Select(c=>(byte)c - 30).ToArray();
That code will only work if you're SURE that each element is a number; otherisw the parsing will throw an exception. A simple Regex that will verify this is true is "^\d+$" (matches a full string consisting of one or more digit characters), used in the Regex.IsMatch() static method.
You can simply do:
"123456".Select(q => new string(q,1)).ToArray();
to have an enumerable of integers, as per comment request, you can:
"123456".Select(q => int.Parse(new string(q,1))).ToArray();
It is a little weak since it assumes the string actually contains numbers.
Here is some code that might help you out. Strings can be treated as an array of characters
string numbers = "12345";
int[] intArray = new int[numbers.Length];
for (int i=0; i < numbers.Length; i++)
{
intArray[i] = int.Parse(numbers[i]);
}
Substring and Join methods are usable for this statement.
string no = "12345";
string [] numberArray = new string[no.Length];
int counter = 0;
for (int i = 0; i < no.Length; i++)
{
numberArray[i] = no.Substring(counter, 1); // 1 is split length
counter++;
}
Console.WriteLine(string.Join(" ", numberArray)); //output >>> 0 1 2 3 4 5

How do I generate a set of random strings in a C# program so that they are not trivially predicted?

I faced a following problem: generate N unique alphanumeric strings from a restricted alphabet. Here's my solution in C#:
string Alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
Random generator = new Random();
const int ToGenerate = 10000;
const int CharactersCount = 4;
ArrayList generatedStrings = new ArrayList();
while( generatedStrings.Count < ToGenerate ) {
string newString = "Prefix";
for( int i = 0; i < CharactersCount; i++ ) {
int index = generator.Next( Alphabet.Length );
char character = Alphabet[index];
newString += character;
}
if( !generatedStrings.Contains( newString ) ) {
generatedStrings.Add( newString );
}
}
for( int i = 0; i < generatedStrings.Count; i++ ) {
System.Console.Out.WriteLine( generatedStrings[i] );
}
it generates 10K strings starting with "Prefix" and otherwise consisting of capital letters and numbers. The output looks good.
Now I see the following problem. The produced strings are for a scenario where they should be unlikely to be predicted by anyone. In my program the seed is time-dependent. Once someone knows the seed value he can run the same code and get the exact same strings. If he knows any two strings he can easily figure out my algorithm (since it is really naive) and attempt to brute-force the seed value - just enumerate all possible seed values until he sees the two known strings in the output.
Is there some simple change that could be done to my code to make the described attack less possible?
Well, how would he know the seed? Unless he knew the exact time you ran the code, that is very hard to do. But if you need stronger, you can also create cryptographically strong random numbers via System.Security.Cryptography.RandomNumberGenerator.Create - something like:
var rng = System.Security.Cryptography.RandomNumberGenerator.Create();
byte[] buffer = new byte[4];
char[] chars = new char[CharactersCount];
for(int i = 0 ; i < chars.Length ; i++)
{
rng.GetBytes(buffer);
int nxt = BitConverter.ToInt32(buffer, 0);
int index = nxt % Alphabet.Length;
if(index < 0) index += Alphabet.Length;
chars[i] = Alphabet[index];
}
string s = new string(chars);
Well, it depends what you consider "simple".
You can "solve" your problem by using a "true" source of random numbers. You can try the free ones (random.org, fourmilab hotbits, etc), or buy one, depending on the sort of operation you're running.
Alternatively (and perhaps better) is to not generate in advance, and instead generate on demand. But this may be a significant change to your business process/model.

what's the C# equivalent of string$ from basic

And is there an elegant linqy way to do it?
What I want to do is create string of given length with made of up multiples of another string up to that length
So for length - 9 and input string "xxx" I get "xxxxxxxxx" (ie length 9)
for a non integral multiple then I'd like to truncate the line.
I can do this using loops and a StringBuilder easily but I'm looking to see if the language can express this idea easily.
(FYI I'm making easter maths homework for my son)
No, nothing simple and elegant - you have to basically code this yourself.
You can construct a string with a number of repeated characters, but ot repeated strings,
i.e.
string s = new string("#", 6); // s = "######"
To do this with strings, you would need a loop to concatenate them, and the easest would then be to use substring to truncate to the desired final length - along the lines of:
string FillString(string text, int count)
{
StringBuilder s = new StringBuilder();
for(int i = 0; i <= count / text.Length; i++)
s.Add(text);
return(s.ToString().Substring(count));
}
A possible solution using Enumerable.Repeat.
const int TargetLength = 10;
string pattern = "xxx";
int repeatCount = TargetLength / pattern.Length + 1;
string result = String.Concat(Enumerable.Repeat(pattern, repeatCount).ToArray());
result = result.Substring(0, TargetLength);
Console.WriteLine(result);
Console.WriteLine(result.Length);
My Linqy (;)) solution would be to create an extension method. Linq is language integrated query, so why the abuse? Im pretty sure it's possible with the select statement of linq since you can create new (anonymous) objects, but why...?

Categories