We need to process a lot of strings, which contain many numbers (formatted as strings) and are separated by an arbitrary character.
Ex: numbers separated by a space:
var s = "1234.555 43434.43 434.436 85656.253 564.5656 <etc.>"
Now we are using String.Split to first create an array of strings (containing each number) and then parse these using double.Parse() to a numeric type.
var stringArr = s.Split(separator, StringSplitOptions.RemoveEmptyEntries);
// Some iteration/loop in which at some point this routine is called
var d = double.Parse(stringArr[i]);
But we notice this takes up a lot of resources (memory allocation and CPU), so we would like to create at least a less memory-consuming solution.
The first step is to enumerate the number occurrences in the string and then parse the substrings to a number.
For enumerating, we already found a nice solution on SO:
What are the alternatives to Split a string in c# that don't use String.Split()
However, this routine then uses Substring to read a parting string from the entire string, so I would expect this results (again) in an extra allocation of memory to be able to use/read that partial string...
If so (regarding memory allocation), is there a way to only parse the found part directly from the original string (without allocating a new string) to a numeric value...
I don't mind using unsafe code, if necessary... It will only be marked unsafe during the brief period the numbers are extracted.
separated by an arbitrary character
this forces you to either only handle integers, or to enforce some limitations on what characters can be used as separators and/or decimal separator. For example
1 234.567
might be interpreted as either a single number, two number, or three numbers depending on what characters you want to allow as separators. Note that the thousand separator and decimal separators depend on culture just to make things more complicated.
If you want to minimize memory usage, Double.Parse has a overload that takes a ReadOnlySpan<char> s as input. So just iterate over your string, once you encounter a number character you save the index and iterate to the next non-number character, and use Span<T>.Slice to create a span over the number digits and parse these. Continue until you have parsed all characters.
But you will still need to define what characters counts as part of a number or not. Note that you might want to Parse with the CultureInfo.InvariantCulture to get consistent behavior regardless of region.
I fiddled something.
Based on the assumption, that the "arbitrary" split char is known.
I also added the variant as entrypoint to a DataFlowPipeline, because ReadOnlySpan has some quirks with async and enumeration :D
It is a naive approach to give an idea. You may want to add some sanity checks etc.
using System;
using System.Collections.Generic;
using System.Threading.Tasks.Dataflow;
using System.Threading.Tasks;
public class Program
{
public static async Task Main()
{
var s = "1234.555 43434.43 434.436 85656.253 564.5656";
//foreach( var d in Parse(s, ' ') ) Console.WriteLine(d);
var ab = new ActionBlock<double>(d => Console.WriteLine(d));
ParseAndEnqueue(s, ' ', ab);
await ab.Completion;
}
public static IEnumerable<double> Parse(ReadOnlySpan<char> input, char splitChar)
{
var result = new List<double>();
int nextIndex = input.IndexOf(splitChar);
while( nextIndex >= 0 )
{
if(double.TryParse(input[..nextIndex], out double myVal)) result.Add(myVal);
if( nextIndex+1 >= input.Length ) break;
input = input[(nextIndex+1)..];
nextIndex = input.IndexOf(splitChar);
if( nextIndex < 0 )
if(double.TryParse(input, out double myLastVal)) result.Add(myLastVal);
}
return result;
}
public static void ParseAndEnqueue(ReadOnlySpan<char> input, char splitChar, ActionBlock<double> ab)
{
int nextIndex = input.IndexOf(splitChar);
while( nextIndex >= 0 )
{
if(double.TryParse(input[..nextIndex], out double myVal)) ab.Post(myVal);
if( nextIndex+1 >= input.Length ) break;
input = input[(nextIndex+1)..];
nextIndex = input.IndexOf(splitChar);
if( nextIndex < 0 )
if(double.TryParse(input, out double myLastVal)) ab.Post(myLastVal);
}
ab.Complete();
}
}
=> Fiddle
Am reading from a text file using the code below.
if (!allLines.Contains(":70"))
{
var firstIndex = allLines.IndexOf(":20");
var secondIndex = allLines.IndexOf(":23B");
var thirdIndex = allLines.IndexOf(":59");
var fourthIndex = allLines.IndexOf(":71A");
var fifthIndex = allLines.IndexOf(":72");
var sixthIndex = allLines.IndexOf("-}");
var firstValue = allLines.Substring(firstIndex + 4, secondIndex - firstIndex - 5).TrimEnd();
var secondValue = allLines.Substring(thirdIndex + 4, fourthIndex - thirdIndex - 5).TrimEnd();
var thirdValue = allLines.Substring(fifthIndex + 4, sixthIndex - fifthIndex - 5).TrimEnd();
var len1 = firstValue.Length;
var len2 = secondValue.Length;
var len3 = thirdValue.Length;
inflow103.REFERENCE = firstValue.TrimEnd();
pointer = 1;
inflow103.BENEFICIARY_CUSTOMER = secondValue;
inflow103.RECEIVER_INFORMATION = thirdValue;
}
else if (allLines.Contains(":70"))
{
var firstIndex = allLines.IndexOf(":20");
var secondIndex = allLines.IndexOf(":23B");
var thirdIndex = allLines.IndexOf(":59");
var fourthIndex = allLines.IndexOf(":70");
var fifthIndex = allLines.IndexOf(":71");
var sixthIndex = allLines.IndexOf(":72");
var seventhIndex = allLines.IndexOf("-}");
var firstValue = allLines.Substring(firstIndex + 4, secondIndex - firstIndex - 5).TrimEnd();
var secondValue = allLines.Substring(thirdIndex + 5, fourthIndex - thirdIndex - 5).TrimEnd();
var thirdValue = allLines.Substring(sixthIndex + 4, seventhIndex - sixthIndex - 5).TrimEnd();
var len1 = firstValue.Length;
var len2 = secondValue.Length;
var len3 = thirdValue.Length;
inflow103.REFERENCE = firstValue.TrimEnd();
pointer = 1;
inflow103.BENEFICIARY_CUSTOMER = secondValue;
inflow103.RECEIVER_INFORMATION = thirdValue;
}
Below is the format of the text file am reading.
{1:F21DBLNNGLAAXXX4695300820}{4:{177:1405260906}{451:0}}{1:F01DBLNNGLAAXXX4695300820}{2:O1030859140526SBICNGLXAXXX74790400761405260900N}{3:{103:NGR}{108:AB8144573}{115:3323774}}{4:
:20:SBICNG958839-2
:23B:CRED
:23E:SDVA
:32A:140526NGN168000000,
:50K:IHS PLC
:53A:/3000025296
SBICNGLXXXX
:57A:/3000024426
DBLNNGLA
:59:/0040186345
SONORA CAPITAL AND INVSTMENT LTD
:71A:OUR
:72:/CODTYPTR/001
-}{5:{MAC:00000000}{PAC:00000000}{CHK:42D0D867739F}}{S:{SPD:}{SAC:}{FAC:}{COP:P}}
The above file format represent one transaction in a single text file, but while testing with live files, I came accross a situation where a file can have more than one transaction. Example is the code below.
{1:F21DBLNNGLAAXXX4694300150}{4:{177:1405231923}{451:0}}{1:F01DBLNNGLAAXXX4694300150}{2:O1031656140523FCMBNGLAAXXX17087957771405231916N}{3:{103:NGR}{115:3322817}}{4:
:20:TRONGN3RDB16
:23B:CRED
:23E:SDVA
:26T:001
:32A:140523NGN1634150,00
:50K:/2206117013
SUNLEK INVESTMENT LTD
:53A:/3000024763
FCMBNGLA
:57A:/3000024426
DBLNNGLA
:59:/0022617678
GOLDEN DC INT'L LTD
:71A:OUR
:72:/CODTYPTR/001
//BNF/TRSF
-}{5:{MAC:00000000}{PAC:00000000}{CHK:C21000C4ECBA}{DLM:}}{S:{SPD:}{SAC:}{FAC:}{COP:P}}${1:F21DBLNNGLAAXXX4694300151}{4:{177:1405231923}{451:0}}{1:F01DBLNNGLAAXXX4694300151}{2:O1031656140523FCMBNGLAAXXX17087957781405231916N}{3:{103:NGR}{115:3322818}}{4:
:20:TRONGN3RDB17
:23B:CRED
:23E:SDVA
:26T:001
:32A:140523NGN450000,00
:50K:/2206117013
SUNLEK INVESTMENT LTD
:53A:/3000024763
FCMBNGLA
:57A:/3000024426
DBLNNGLA
:59:/0032501697
SUNSTEEL INDUSTRIES LTD
:71A:OUR
:72:/CODTYPTR/001
//BNF/TRSF
-}{5:{MAC:00000000}{PAC:00000000}{CHK:01C3B7B3CA53}{DLM:}}{S:{SPD:}{SAC:}{FAC:}{COP:P}}
My challenge is that in my code, while reading allLines, each line is identified by certain index, a situation where I need to pick up the second transaction from the file, and the same index exist like as before, how can I manage this situation.
This is a simple problem obscured by excess code. All you are doing is extracting 3 values from a chunk of text where the precise layout can vary from one chunk to another.
There are 3 things I think you need to do.
Refactor the code. Instead of two hefty if blocks inline, you need functions that extract the required text.
Use regular expressions. A single regular expression can extract the values you need in one line instead of several.
Separate the code from the data. The logic of these two blocks is identical, only the data changes. So write one function and pass in the regular expression(s) needed to extract the data items you need.
Unfortunately this calls for a significant lift in the abstraction level of the code, which may be beyond what you're ready for. However, if you can do this and (say) you have function Extract() with regular expressions as arguments, you can apply that function once, twice or as often as needed to handle variations in your basic transaction.
You may perhaps use the code below to achieve multiple record manipulation using your existing code
//assuming fileText is all the text read from the text file
string[] fileData = fileText.Split('$');
foreach(string allLines in fileData)
{
//your code goes here
}
Maybe indexing works, but given the particular structure of the format, I highly doubt that is a good solution. But if it works for you, then that's great. You can simply split on $ and then pass each substring into a method. This assures that the index for each substring starts at the beginning of the entry.
However, if you run into a situation where indices are no longer static, then before you even start to write a parser for any format, you need to first understand the format. If you don't have any documentation and are basically reverse engineering it, that's what you need to do. Maybe someone else has specifications. Maybe the source of this data has it somewhere. But I will proceed under the assumption that none of this information is available and you have been given a task with absolutely no support and are expected to reverse-engineer it.
Any format that is meant to be parsed and written by a computer will 9 out of 10 times be well-formed. I'd say 9.9 out of 10 for that matter, since there are cases where people make things unnecessarily complex for the sake of "security".
When I look at your sample data, I see "chunks" of data enclosed within curly braces, as well as nested chunks.
For example, you have things like
{tag1:value1} // simple chunk
{tag2:{tag3: value3}{tag4:value4}} // nested chunk
Multiple transactions are delimited by a $ apparently. You may be able to split on $ signs in this case, but again you need to be sure that the $ is a special character and doesn't appear in tags or values themselves.
Do not be fixated on what a "chunk" is or why I use the term. All you need to know is that there are "tags" and each tag comes with a particular "value".
The value can be anything: a primitive such as a string or number, or it can be another chunk. This suggests that you need to first figure out what type of value each tag accepts. For example, the 1 tag takes a string. The 4 tag takes multiple chunk, possibly representing different companies. There are chunks like DLM that have an empty value.
From these two samples, I would assume that you need to consume each each chunk, check the tag, and then parse the value. Since there are nested chunks, you likely need to store them in a particular way to correctly handle it.
Is possible to subtract roman numbers without conversion to decimal numbers?
For Example:
X - III = VII
So in input I have X and III. In output I have VII.
I need algorithm without conversion to decimal number.
Now I don't have an idea.
The most simple algorithm will be to create -- function for Romans. Subtracting A-B means repeating simultaneous A-- and B--, until having nothing in B.
But I wanted to do something more effective
The Roman numbers can be looked at as positional in some very weak way. We'll use it.
Let's make short tables of substraction:
X-V=V
X-I=IX
IX-I=VIII
VIII-I=VII
VII-I=VI
VI-I=V
V-I=IV
IV-I=III
III-I=II
II-I=I
I-I=_
And addition:
V+I=VI
And the same for CLX and MDC levels. Of course, you could create only one table, but to use it on different levels by substitution of letters.
Let's take numbers, for example, A=MMDCVI=2606 a B=CCCXLIII=343
Lets distribute them into levels=powers of 10. The several following operations will be inside levels only.
A=MM+DC+VI, B=CCC+XL+III
Then subtracting
A-B= MM+(DC-CCC)+(-XL)+(VI-III)
At the every level we have three possible letter: units, five-units and ten-units. The combinations (unit, five-units) and (unit, ten-unit) will be translated into differences
A-B= MM+(DC-CCC)+(-L+X)+(VI-III)
The normal combinations (where senior symbol is before junior one), will be translated into sums.
A-B= MM+(D+C-C-C-C)+(-L+X)+(V+I-I-I-I)
Shorten the combinations of same symbols
A-B= MM+(D-C-C)+(-L+X)+(V-I-I)
If some level is negative, borrow a unit from the senior level. Of course, it could work through empty level.
A-B= MM+(D-C-C-C)+(C-L+X)+(V-I-I)
Now, in every level we'll apply the subtraction table we have made, subtracting every minused symbol, strarting from the top of the table and repeating it until no minused members remain.
A-B= MM+(CD-C-C)+(L+X)+(IV-I)
A-B= MM+(CCC-C)+(L+X)+(III)
A-B= MM+(CC)+(L+X)+(III)
Now, use the addition table
A-B= MM+(CC)+(LX)+(III)
Now, we'll open the parenthesis. If there is '_' in some level, there will be nothing on its place.
A-B=MMCCLXIII =2263
The result is correct.
There is a more elegant solution than simply unrolling the whole roman number. The disadvantage of this would be a complexity in O(n) as opposed to O(log n) where n is the input number.
I found this task quite interesting. It is indeed possible without a conversion. Basically, you just have look at the last digit. If they match, take them away, if not, replace the bigger one. However, the whole task gets a lot more complicated by numbers like "IV", because you need a lookahead.
Here is the code. Since this is most likely a homework assignment, I took out some code so you have to think for yourself, how the rest should look like.
private static char[] romanLetters = { 'I', 'V', 'X', 'L', 'C', 'D', 'M' };
private static string[] vals = { "IIIII", "VV", "XXXXX", "LL", "CCCCC", "DD" };
static string RomanSubtract(string a, string b)
{
var _a = new StringBuilder(a);
var _b = new StringBuilder(b);
var aIndex = a.Length - 1;
var bIndex = b.Length - 1;
while (_a.Length > 0 && _b.Length > 0)
{
if (characters match)
{
if (lookahead for a finds a smaller char)
{
aIndex = ReplaceRomans(_a, aIndex, aChar);
continue;
}
if (lookahead for b finds a smaller char)
{
bIndex = ReplaceRomans(_b, bIndex, bChar);
continue;
}
_a.Remove(aIndex, 1);
_b.Remove(bIndex, 1);
aIndex--;
bIndex--;
}
else if (aChar > bChar)
{
aIndex = ReplaceRomans(_a, aIndex, aChar);
}
else
{
bIndex = ReplaceRomans(_b, bIndex, bChar);
}
}
return _a.Length > 0 ? _a.ToString() : "-" + _b.ToString();
}
private static int ReplaceRomans(StringBuilder roman, int index, int charIndex)
{
if (index > 0)
{
var beforeChar = Array.IndexOf(romanLetters, roman[index - 1]);
if (beforeChar < charIndex)
{
Replace e.g. IX with VIIII
}
}
Replace e.g. V with IIIII
}
Apart from checking every possible combination of input numbers - assuming the input is bounded - there is no way to do what you're asking. Roman numerals are awful in terms of mathematical operations.
You could write an algorithm that doesn't convert them, but it'd have to use decimal numbers at some point. Or you could normalize them to e.g. "IIIII...", but again you'd need to write some equivalences like "50 chars = L".
Rough idea:
Create a "map" or list of how each roman numeral relates to simpler numerals, for instance IV corresponds to (II + II), while V corresponds to (III + II), and X corresponds to (V + V).
When calculating e.g. X - III, treat this not as a mathematical term, but a string, which can be changed in several steps, where you each time check for something to remove from both sides of the minus operator:
x - III // Nothing to remove
(V + V) - III // Still nothing to remove
(III + II + III + II) - III // NOW we can remove a "III" from both sides
// while still treating these as roman numerals.
Result: III + II + II
Rejoined: V + II = VII.
If you make each number correspond to something as simple as possible in the "map" (e.g. III can correspond to (II + I), so you don't get stuck with left-overs), then I'm pretty sure you can figure out some kind of solution here.
Of course this requires a bunch of string-operations, comparisons, and a map from which your algorithm can "know" how to compare or switch values. Not exactly traditional maths, but then again, I suppose this is how roman numerals do work.
The basic sketch of my idea is to build up simple converters that chain together via either iterators or observables.
So, for instance, on the input side of things you have a CConverter that performs the transormations of the combinations CD, CM, D and M into CCCC, CCCCCCCCC, CCCCC, and CCCCCCCCCC respectively. All other received inputs are passed through unmolested. Then the next converter in line XConverter converts XL, XC, L and X into the appropriate number of Xs, and so on until you just have a stream of all Is.
Then you perform the subtraction by consuming both of these streams of Is, in lockstep. If the minuend runs out first, then the answer is 0 or negative, in which case everything has gone wrong. Otherwise, when the subtrahend runs out, you just start emitting all remaining Is from the minuend.
Now you need to convert back. So the first INormalizer queues up Is until it's received five of them, then it emits a V. If it reaches the end of the stream and it received four, then it emits IV. Otherwise it just emits as many Is as it received until the end of the stream, and then ends its own stream.
Next, the VNormalizer queues up Vs until it's received two, and then emits an X. If it receives an IV and it has one queued V then it emits IX, otherwise it emits IV.
And if the stream it's receiving ends or just starts sending Is and it still has a V queued, then it emits that, then whatever else the sending stream wanted to send, and then ends its own stream.
And so on, building back up into the correct roman numerals.
Parse the input strings to group the digits in the mixed 5/10 base (M, D, C, L, X, I). I.e. MMXVII yields MM||||X|V|II.
Now subtract from right to left, by canceling the digits in pairs. I.e. V|III - II = V|II - I = V|I.
When required, do a borrow, i.e. split the next highest digit (V splits to IIIII, X to VV...). Example: V|I - III = V| - II = IIIII - II = III. Borrows may need to be recursive, like X||I - III = X|| - II = VV| - II = V|IIIII - II = V|III.
The prefix notation (IV, IX, XL, XC...) makes it a little more complicated. An approach is to preprocess the string to remove them on input (substitute with IIII, VIIII, XXXX, LXXXX...) and postprocess to restore them on output.
Example:
XCIX - LVI = LXXXXVIIII - LVI = L|XXXX|V|IIII - L|V|I = L|XXXX|V|III - L|V| = L|XXXX||III - L|| = XXXX||III = XXXXXIII = XLIII
Pure character processing, no arithmetic involved.
Digits= "MDCLXVI"
Divided= ["DD", "CCCCC", "LL", "XXXXX", "VV", "IIIII"]
def In(Input):
return Input.replace("CM", "DCCCC").replace("CD", "CCCC").replace("XC", "LXXXX").replace("XL", "XXXX").replace("IX", "VIIII").replace("IV", "IIII")
def Group(Input):
Groups= []
for Digit in Digits:
# Split after the last digit
m= Input.rfind(Digit) + 1
Groups.append(Input[:m])
Input= Input[m:]
return Groups
def Decrement(A, i):
if len(A[i]) == 0:
# Borrow
Decrement(A, i - 1)
A[i]= Divided[i - 1] + A[i]
A[i]= A[i][:-1]
def Subtract(A, B):
for i in range(len(Digits) - 1, -1, -1):
while len(B[i]) > 0:
Decrement(A, i)
B[i]= B[i][:-1]
def Out(Input):
return Input.replace("DCCCC", "CM").replace("CCCC", "CD").replace("LXXXX", "XC").replace("XXXX", "XL").replace("VIIII", "IX").replace("IIII", "IV")
A= Group(In("MMDCVI"))
B= Group(In("CCCXLIII"))
Subtract(A, B)
print Out("".join(A))
>>>
MMCCLXIII
How about an Enum?
public enum RomanNumber
{
I = 1,
II = 2,
III = 3,
IV = 4,
V = 5,
VI = 6,
VII = 7,
VIII = 8,
IX = 9
X = 10
}
Then using it like this:
int newRomanNumber = (int) RomanNumber.X - (int) RomanNumber.III
If your input is 'X - III = VII', then you will also have to parse this string.
But I won't do this work for you. ;-)
So, what I'm trying to do this something like this: (example)
a,b,c,d.. etc. aa,ab,ac.. etc. ba,bb,bc, etc.
So, this can essentially be explained as generally increasing and just printing all possible variations, starting at a. So far, I've been able to do it with one letter, starting out like this:
for (int i = 97; i <= 122; i++)
{
item = (char)i
}
But, I'm unable to eventually add the second letter, third letter, and so forth. Is anyone able to provide input? Thanks.
Since there hasn't been a solution so far that would literally "increment a string", here is one that does:
static string Increment(string s) {
if (s.All(c => c == 'z')) {
return new string('a', s.Length + 1);
}
var res = s.ToCharArray();
var pos = res.Length - 1;
do {
if (res[pos] != 'z') {
res[pos]++;
break;
}
res[pos--] = 'a';
} while (true);
return new string(res);
}
The idea is simple: pretend that letters are your digits, and do an increment the way they teach in an elementary school. Start from the rightmost "digit", and increment it. If you hit a nine (which is 'z' in our system), move on to the prior digit; otherwise, you are done incrementing.
The obvious special case is when the "number" is composed entirely of nines. This is when your "counter" needs to roll to the next size up, and add a "digit". This special condition is checked at the beginning of the method: if the string is composed of N letters 'z', a string of N+1 letter 'a's is returned.
Here is a link to a quick demonstration of this code on ideone.
Each iteration of Your for loop is completely
overwriting what is in "item" - the for loop is just assigning one character "i" at a time
If item is a String, Use something like this:
item = "";
for (int i = 97; i <= 122; i++)
{
item += (char)i;
}
something to the affect of
public string IncrementString(string value)
{
if (string.IsNullOrEmpty(value)) return "a";
var chars = value.ToArray();
var last = chars.Last();
if(char.ToByte() == 122)
return value + "a";
return value.SubString(0, value.Length) + (char)(char.ToByte()+1);
}
you'll probably need to convert the char to a byte. That can be encapsulated in an extension method like static int ToByte(this char);
StringBuilder is a better choice when building large amounts of strings. so you may want to consider using that instead of string concatenation.
Another way to look at this is that you want to count in base 26. The computer is very good at counting and since it always has to convert from base 2 (binary), which is the way it stores values, to base 10 (decimal--the number system you and I generally think in), converting to different number bases is also very easy.
There's a general base converter here https://stackoverflow.com/a/3265796/351385 which converts an array of bytes to an arbitrary base. Once you have a good understanding of number bases and can understand that code, it's a simple matter to create a base 26 counter that counts in binary, but converts to base 26 for display.
Okay this is probably more a maths question but since its related to programming and my web application i'll ask here first:
I'm trying to create short id's that are 8 characters long . The "pool" to draw the id from is a combination of numbers, upper and lower case letters.
string charPool = "ABCDEFGOPQRSTUVWXY1234567890ZabcdefghijklmHIJKLMNnopqrstuvwxyz"
And if you're interested here's the method:
private string GenerateRandomCode(int length)
{
string charPool = "ABCDEFGOPQRSTUVWXY1234567890ZabcdefghijklmHIJKLMNnopqrstuvwxyz";
StringBuilder rs = new StringBuilder();
for (int i = 0; i < length; i++)
{
rs.Append(charPool[(int)(_random.NextDouble() * charPool.Length)]);
}
return rs.ToString();
}
How many possible combinations are there for 8 character id's? Grateful if you can post the equation as well :)
Thanks
options per slot ^ number of slots = number of combinations
a-z is 26, times 2 (for uppers as well) is 52, plus 10 (0-9) is 62. Each ID is 8 chars long, so the result is 62^8, which is pretty big:
218,340,105,584,896 possible unique ID's
I would suggest doing:
_random.Next(charPool.Length - 1)
(and saving charPool.Length - 1 in a variable outside of the loop), instead of:
_random.NextDouble() * charPool.Length
Because you might get an exact 1.0 with .nextDouble(), which means you will be accessing the array at an index that equals the length, and you will get IndexOutOfRangeException.