Format unstructured string

Format unstructured string - c#

I have tried several methods (by position, by white space, regex) but cannot figure how to best parse the following lines as a table. For e.g. let's say the two lines I want to parse are:
Bonds Bid Offer (mm) (mm) Chng
STACR 2015-HQA1 M1 125 120 5 x 1.5 0
STACR 2015-HQA12 2M2 265 5 x -2
I want that it should parse as follows for [BondName] [Bid] [Offer]:
[STACR 2015-HQA1 M1] [125] [120]
[STACR 2015-HQA12 2M2] [265] [null]
Notice the null which is an actual value and also the spaces should be retained in the bond name. FYI, the number of spaces in the Bond Name will be 2 as in the above examples.
Edit: Since many of you have asked for code here it is. The spaces between the points can range from 1-5 so I cannot reply on spaces (it was straightforward then).
string bondName = quoteLine.Substring(0, 19);
string bid = quoteLine.Substring(19, 5).Trim();
string offer = quoteLine.Substring(24, 6).Trim();
The only way I can see this working is that:
1st data point is STACR (Type)
2nd data point is the year and Series
(e.g. 2015-HQA1)
3rd data point is Tranche (M1)
4th data point is bid
(e.g. 125 ** bid is always available **)
5th data point is offer (e.g. 120 but can be blank
or whitespace which introduces complexity)

With the current set of requirements, I'm assuming the following
1. String starts with 3 part bond name
2. Followed by bid
3. Followed by offer (optional)
4. After that, we'll have something like ... x ... ... (we'll use x as reference)
Given they are valid, you can use the following code
var str = "STACR 2015-HQA1 M1 125 120 5 x 1.5 0"; //your data
var parts = str.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
//we'll use this pattern : <3 part bond name> <bid> <offer/null> <something x ....>
var xIsAt = parts.IndexOf("x"); //we'll use x as reference
if (xIsAt > 2) //first three are BondName
parts.RemoveRange(xIsAt - 1, parts.Count - xIsAt + 1); //remove "5 x 1.5 ..."
var bond = string.Join(" ", parts.Take(3)); //first 3 parts are bond
var bid = parts.Count > 3 ? parts.ElementAt(3) : null; //4th is bid
var offer = parts.Count > 4 ? parts.ElementAt(4) : null; //5th is offer

[EDIT]
I did not account for the blank 'Offer' so this method will fail on a blank 'Offer'. Looks like someone already has a working answer, but i'll leave the linq example for anyone that finds it useful.
[END EDIT]
Linq based option.
Split the string by spaces, and remove empty spaces. Then reverse the order so you can start from the back and work your way forward. The data appears more normalized at the end of the string.
For each successive part of the line, you skip the previous options and only take what you need. For the last part which is the long string, you skip what you don't need, then reverse the order back to normal, and join the segments together with spaces.
string test = "STACR 2015-HQA1 M1 125 120 5 x 1.5 0";
var split_string_remove_empty = test.Split(new char[]{ ' ' }, StringSplitOptions.RemoveEmptyEntries).Reverse();
var change = split_string_remove_empty.Take(1)
.SingleOrDefault();
var mm2 = split_string_remove_empty.Skip(1)
.Take(1)
.SingleOrDefault();
var mm3 = split_string_remove_empty.Skip(3)
.Take(1)
.SingleOrDefault();
var offer = split_string_remove_empty.Skip(4)
.Take(1)
.SingleOrDefault();
var bid = split_string_remove_empty.Skip(5)
.Take(1)
.SingleOrDefault();
var bonds = string.Join(" ", split_string_remove_empty.Skip(6)
.Reverse());
Output:

Related

Replace any index in a text frame with output of a method

I design a frame for message with a some index in it for each person in list. like the one bellow:
Dear {0}
Hi,
the total amount of Draft is {1}.
amount of prm is {2}
yesterday amount is {3}
I wrote a method witch return all different type of amount and insert the out put of method in a list . I want to replace each item of text frame with the correct amount .
for example the output like the list bellow :
sale
reject amount
damage amount
1230
56555
79646354
my method like bellow :
public List<outputList1> listAmount()
{
var amounts = (from p in db.FactTotalAmount
group p by p.FromDate into g
select new outputList1
{
YesterdaySalesPrm = g.Sum(x =>
x.YesterdaySalesPrm),
YesterdayDraftAmount = g.Sum(x =>
x.YesterdayDraftAmount),
PrmSales = g.Sum(x => x.PrmSales),
DraftAmount = g.Sum(x => x.DraftAmount)
}).ToList();
return amounts;
}
would you please help me what should I do

I'm going to teach you to fish.
There are two main ways to build a string using a template - formatting and interpolation.
Option one: use string.Format:
string output = string.Format("Today is {0}. Weather is {1} at {2}°.", "Monday", "rain", 75.2);
// result is "Today is Monday. Weather is rain at 75.2°."
Option two: use C# 6 string interpolation:
string dayOfWeek = "Monday";
string weather = "rain";
decimal temp = 75.2;
// Notice the "$" at the start of the string literal
string output = $"Today is {dayOfWeek}. Weather is {weather} at {temp}°.";
So, you have a model - the data you've collected - and a format string. Combine those together with one of these options to produce the final output string.

Isolating values from a listbox line

I have got a line in a lisbox that i need so i can print out my receipt for the end of my 12 grade project im doing.
Example of my line :"cha1 Adidas Stan Smith White 1 2" (its padded).
Now what i want to do is isolate like cha1, Adidas stan Smith White,1,2 to add to my Microsoft Access Database, i somehow managed to do it with substring but i screwed up my code and now i cant do it , can somebody help me please ?
My code ,that used to work , looks like this :
foreach (string item in lstpreview.Items)
{
//create the string to print on the reciept
string nomeproduto = item;
float quantidade = float.Parse(item.Substring(item.Length -5, 5));
float precounitario = float.Parse(item.Substring(item.Length - 5, 5));
string totalproduto = item.Substring(item.Length - 6, 6);
txt1.Text = Convert.ToString(quantidade);
txt2.Text = Convert.ToString(precounitario);
//MessageBox.Show(item.Substring(item.Length - 5, 5) + "PROD TOTAL: " + totalproduto);
//float totalprice = 0.00f;
}

You say that the line is padded, but do not give any details. If you know that the first field is always the first 4 characters of the line, you can isolate it with string.Substring:
string field1 = line.Substring(0, 4);
and similarly for the other fields.
P.S. Please edit your post and remove the swear word.
Edit after parsing code added
I don't understand your comment, what is "your negative value"? Run the code in the debugger and find which line causes the error. Please post the exact error message.
Is there a reason for converting the substring to a float and then back to a string? I can imagine that you might want to validate that the field is numeric, but then you would be better to use TryParse.
Your second comment is helpful. The last 5 characters of the line are not all numeric, that's the problem.

Done it with this snippet of code together with a for each loop.
string[] caracteresnastring = item.Split(new char[] { ',' }.ToArray());
string code = caracteresnastring[0];
string name = caracteresnastring[1];
string price = caracteresnastring[2];
string quantity = caracteresnastring[3];

String formatting in C# to get identical spacing

I've been looking up string formatting and frankly I'm getting confused. This is what I want to do.
I have a "character stats" page (this is a console app), and I want it formatted like this:
=----------------------------------=
= Strength: 24 | Agility: 30 =
= Dexterity: 30 | Stamina: 28 =
= Magic: 12 | Luck: 18 =
=----------------------------------=
I guess basically I'm trying to find out how to make that middle '|' divider be in the same place regardless of how many letters the stat is or how many points the stat is.
Thanks for the input.
Edit: I also want the ending '=' to also be in the same spot.

I learned something new, it seems! As some of the others have mentioned, you can accomplish the same thing using String.Format.
The interpolation strings used in String.Format can also include an optional alignment component.
// index alignment
// v v
String.Format("Hello {0,-10}!", "World");
When this is negative, then the string is left-aligned. When positive, it is right aligned. In both cases, the string is padded correspondingly with whitespace if it is shorter than the specified width (otherwise, the string is just inserted fully).
I believe this is an easier and more readable technique than having to fiddle with String.PadRight.
You can also use String.PadRight (or String.PadLeft). Example:
class Stats {
// Contains properties as you defined ...
}
var stats = new Stats(...);
int leftColWidth = 16;
int rightColWidth = 13;
var sb = new StringBuilder();
sb.AppendLine("=----------------------------------=");
sb.Append("= ");
sb.Append(("Strength: " + stats.Strength.ToString()).PadRight(leftColWidth));
sb.Append(" | ");
sb.Append(("Agility: " + stats.Agility.ToString()).PadRight(rightColWidth));
// And so on.

I used to use this technique a lot back in the 80's doing text based games. Obviously we didn't have string.Format back in those days; but it allows you to visualize the layout in the code.
Pre-format the text as you want it to be laid out, then just use the string.Format() function like so...
string formattedText = #"
=----------------------------------=
= Strength: {0,2} | Agility: {3,2} =
= Dexterity: {1,2} | Stamina: {4,2} =
= Magic: {2,2} | Luck: {5,2} =
=----------------------------------=".Trim();
string output = string.Format(formattedText, 12, 13, 14, 15, 16, 1);
Console.WriteLine(output);
Console.ReadLine();

String.Format("{0,-20}|","Dexterity: 30")
would align the value to the left and pad it to 20 characters. The only problem is that if the parameter is longer than 20 it would not be truncated.

You will need to use a String.PadRight or a String.PadLeft. Do something like this:
Trip_Name1 = Trip_Name1.PadRight(20,' ');
This is what you are looking for I think.

c# regular expression

I have an output like -
Col.A Col.B Col.C Col.D
--------------------------------------------------------------
* 1 S60-01-GE-44T-AC SGFM115001195 7520051202 A
1 S60-PWR-AC APFM115101302 7520047802 A
1 S60-PWR-AC APFM115101245 7520047802 A
or
Col.A Col.B Col.C Col.D
--------------------------------------------------------------
* 0 S50-01-GE-48T-AC DL252040175 7590005605 B
0 S50-PWR-AC N/A N/A N/A
0 S50-FAN N/A N/A N/A
For these outputs the regular expression -
(?:\*)?\s+(?<unitno>\d+)\s+\S+-\d+-(?:GE|TE)?-?(?:\d+(?:F|T))-?(?:(?:AC)|V)?\s+(?<serial>\S+)\s+\S+\s+\S+\s+\n
works fine to capture Column A and Column B. But recently I got a new kind of output -
Col.A Col.B Col.C Col.D
---------------------------------------------------------
* 0 S4810-01-64F HADL120620060 7590009602 A
0 S4810-PWR-AC H6DL120620060 7590008502 A
0 S4810-FAN N/A N/A N/A
0 S4810-FAN N/A N/A N/A
As you can see the patterns "GE|TE" and the "AC|V" are missing from these outputs. How do I change my regular expression accordingly maintaining backward compatibility.
EDIT:
The output that you see comes in a complete string and due to some operational limits I cannot use any other concept other than regex here to get my desired values. I know using split would be ideal here but I cannot.

You are probably better off using String.Split() to break the column values out into sperate strings and then processing them, rather that using a huge un-readable regular expression.
foreach (string line in lines) {
string[] colunnValues = line.Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
...
}

A regular expression seems not to be the right approach here. Use a positional approach
string s = "* 0 S4810-01-64F HADL120620060 7590009602 A";
bool withStar = s[0] == '*';
string nr = s.Substring(2, 2).Trim();
string colA = s.Substring(5, 18).TrimEnd();
string colB = s.Substring(24, 14).TrimEnd();
...
UPDATE
I you want (or must) stick to Regex, test for the spaces instead of the values. Of cause this works only if the values never include spaces.
string[] result = Regex.Split(s, "\s+");
Of cause you can also search for non-spaces \S instead of \s.
MatchCollection matches = Regex.Matches(s, "\S+");
or excluding the star
(?:\*)?[^*\s]+

your regular expression doesn't even need GE or TE. See that ? after (?:GE|TE)?
that means that the previous group or symbol is optional.
the same is true with the AC and V section

I would not use regular expressions to parse these reports.
Instead, treat them as fixed column width reports after the headers are stripped off.
I would do something like (this is typed cold as an example, not tested even for syntax):
// Leaving off all public/private/error detection stuff
class ColumnDef
{
string Name { set; get; }
int FirstCol { set; get; }
int LastCol { set; get; }
}
ColumnDef[] report = new ColumnDef[]
{
{ Name = "ColA",
FirstCol = 0,
LastCol = 2
},
/// ... and so on for each column
}
IDictionary<string, string> ParseDataLine(string line)
{
var dummy = new Dictionary<string, string>();
foreach (var c in report)
{
dummy[c.Name] = line.Substring(c.FirstCol, c.LastCol).Trim();
}
}
This is an example of a generic ETL (Extract, Transform, and Load) problem--specifically the Extract stage.
You will have to strip out header and footer lines before using ParseDataLine, and I am not sure there is enough information shown to do that. Based on what your post says, any line that is blank, or doesn't start with a space or a * is a header/footer line to be ignored.

Why not try something like this (?:\*)?\s+(?<unitno>\d+)\s+\S+\s+(?<serial>\S+)\s+\S+\s+\S+(?:\s+)?\n
This is built off your provided regular expression and due to the trailing \n the provided input will need to end with a carriage return.

Split values in arrays

I have a Long string from that I want to store the keyword in array or collection, the format of my string is like below:
Title: My Test Page Title.
Desc: My page description.
Keywords: Bessel function, legendre function, Differential Equations, Bessel, Legendre, Homogenous, Assignment & Maths Homework Help.
Bessel & Legendre Function:
Homogenous Equations of the second order of the type
+ x + ( - )y = 0, v [0, ), x [0, )………………….(1)
(1 - ) - 2x + n (n + 1)y = 0, n = 1, 2 ……, x (-1, 1)…………………(2)
In this String I want to store all Keywords in Array/collection split from comma.
My problem is that How I can find out the starting and ending point to split the keywords, I can get the Starting point from Keywords: but what should be my ending point to store the keyword in array/collection, there is no any fix format,
there is only one fix format which is there will be a Para after ending the Keyword section.
any one can suggest me regular expression for this.

there will be a Para
Seems like you should first split the string into lines.
And then the line that starts with Keywords: holds your keywords.
You can use the string.Split() method to split into lines as well as for breaking out the keywords.

It also looks like the Keywords section ends with a fullstop. So you could find the next fullstop ie IndexOf(".") after the "Keywords:" ....

I think this should do:
string afterKeywords = data.Substring(data.IndexOf("Keywords:") + 9);
string beforeNextPara = afterKeywords.Substring(0, afterKeywords.IndexOf(Environment.NewLine + Environment.NewLine));
var dataWeNeed = beforeNextPara.Split(',');

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Format unstructured string - c#

Related

Replace any index in a text frame with output of a method

Isolating values from a listbox line

String formatting in C# to get identical spacing

c# regular expression

Split values in arrays

Categories

Resources