tokenizing a string between nested delimiters - c#

So, splitting a string based on a delimiter is easy with good 'ol string.split. Now let's say I want to split on an open curly bracket and a closed curly bracket. Also straightforward with:
var foo = "{foo}{bar}";
var splitme = foo.Split(new char[] { '{', '}'});
Now let's make it more complicated by adding nested { } inside the initial opening/closing { }, up to n levels deep. What I'm after is trying to parse a what looks to be proprietary text file format for game mods (stellaris, great game), and I'm looking for a good way to parse this thing. How would I go about preserving each part of the bracketized (tokenized?) piece of a text? Adding to the mix is preserving a key value pair sort of business using an = as the indicator of a relation.
Here is an example of something I'm trying to parse in this fashion:
#Neutronium Materials
tech_ship_armor_5 = {
area = engineering
cost = #tier3cost4
tier = 3
category = { materials }
ai_update_type = military
prerequisites = { "tech_ship_armor_4" "tech_mine_neutronium" }
weight = #tier3weight4
weight_modifier = {
factor = 1.25
modifier = {
factor = 1.25
research_leader = {
area = engineering
has_trait = "leader_trait_expertise_materials"
}
}
}
ai_weight = {
modifier = {
factor = 1.25
research_leader = {
area = engineering
has_trait = "leader_trait_expertise_materials"
}
}
}
}
My first approach was to read this bad boy line by line with a StreamReader, and keep track of how many { I run into before they start getting closed with the corresponding }. Within each chunk of {} I hunt down that = and then figure out my key value pair that I just found, and where it exists in the hierarchy. This... doesn't seem ideal. Is there a better way with some regex magic or an off the shelf text parsing library?

My first thought would be to look at a JSON parser and see how it's done there.
Your sample looks to be best parsed via recursion: for example, consider tech_ship_armor_5 to be an object, get its opening tag, verify existence of its closing tag and go from there.
So then you'd have a tech_ship_armor_5.area property with a value of engineering; the value of the category property would then be another object materials with properties of its own.
Yep, JSON-like parsing is the way to go with this.

Related

Convert string to array of strings by regex

What is the best way to achieve the following result?
input string: this is a {test} for {performance}.
output string[]: ["this is a ", "{test}", " for ", "{performance}", "."]
This is what I have so far:
private void StringToArray(string text)
{
var firstSplit = text.Split('{');
var list = new List<string>();
foreach(var s in firstSplit)
{
if (s.Contains("}"))
{
var secondSplit = s.Split('}');
list.Add("{" + secondSplit[0] + "}");
if(secondSplit.Count() > 1)
{
list.Add(secondSplit[1]);
}
}
else
{
list.Add(s);
}
}
Console.WriteLine(string.Join(",", list.ToArray()));
}
If you want to get all computer-science-y, you could build a state machine that walks the string one character at a time, using a stack to track the start position for your next string section. When you find a { or } character, you pop the prior stack value and use it to create a substring you can add to your array. Then you push the current index onto the stack.
This can be more or less complex based on how carefully you want to handle nesting (ie: "This is { a {test}.}"), escape characters (ie: {{ or \{), unbalanced strings (ie: This {is} a {Test), whitespace within brace values (ie: This {is some} text) or excluding things like quoted strings (ie: "This is the text, "A {person} lived there."") But a character-by-character state machine will tend to be the best-performing option.
At a higher level, you can look at a domain specific language, or using a lexxer/parser tool. Unfortunately, there's very little in the middle. RegEx can be made to work, but it's generally awful for this because your input can't be said to be formally "regular". Splitting on word boundaries will be error-prone. Basic string manipulation (ie: IndexOf()) is slower and no simpler or easier to write than the state machine would be.

How to search for specific char in an array and how to then manipulate that index in c#

Okay, so I'm creating a hangman game and everything functions so far, including what I'm TRYING to do in the question.
But it feels like there is a much more efficient method of obtaining the char that is also easier to manipulate the index.
protected static void alphabetSelector(string activeWordAlphabet)
{
char[] activeWord = activeWordAlphabet.ToCharArray();
string activeWordString = new string(activeWord);
Console.WriteLine("If you'd like to guess a letter, enter the letter. \n
If you'd like to guess the word, please type in the word. --- testing answer{0}",
activeWordString);
//Console.WriteLine("For Testing Purposes ONLY");
String chosenLetter = Console.ReadLine();
//Char[] letterFinder = Array.FindAll(activeWord, s => s.Equals(chosenLetter));
//string activeWordString = new string(activeWord);
foreach (char letter in activeWord);
{
if(activeWordString.Contains(chosenLetter))
{
Console.WriteLine("{0}", activeWordString);
Console.ReadLine();
}
else
{
Console.WriteLine("errrr...wrong!");
Console.ReadLine();
}
}
}
I have broken up the code in some areas to prevent the reader from having to scroll sideways. If this is bothersome, please let me know and I'll leave it in the future.
So this code will successfully print out the 'word' whenever I select the correct letter from the random word (I have the console print the actual word so that I can test it successfully each time). It will also print 'wrong' when I choose a letter NOT in the string.
But I feel like I should be able to use the
Array.FindAll(activeWord, ...)
functionality or some other way. But every time I try and reorder the arguments, it gives me all kinds of different errors and tells me to redo my arguments.
So, if you can look at this and find an easier method of searching the actual array for the user-selected 'letter', please help!! Even if it's not using the Array.FindAll method!!
Edit
Okay, it seems like there's some confusion with what I've done and why I've done it.
I'm ONLY printing the word inside that 'if' statement to test and make sure that the foreach{if{}} will actually work to find the char inside the string. But I ultimately need to be able to provide a placeholder for a char that is successfully found, as well as being able to 'cross out' the letter (from the alphabet list not shown here).
It's hangman - surely you guys know what I'm needing it to do. It has to keep track of which letters are left in the word, which letters have been chosen, as well as which letters are left in the entire alphabet.
I'm a 4-day old newb when it comes to programming, so please. . . I'm only doing what I know to do and when I get errors, I comment things out and write more until I find something that works.
Take a look at this demo I put together for you: https://dotnetfiddle.net/eP9TQM
I'd suggest creating a second string for the display string. Use a StringBuilder, and you can replace the characters in it at specific indices while creating the fewest number of stringobjects in the process.
string word = "your word or phrase here";
//Initialize a new StringBuilder that will display the word with placeholders.
StringBuilder display = new StringBuilder(word.Length); //You know the display word is the same length as the original word
display.Append('-', word.Length); //Fill it with placeholders.
So now you have your phrase/word, and a string builder full of characters that need to be discovered.
Go ahead and convert the display StringBuilder to a string that you can check on each pass to see if it equals your word:
var displayString = display.ToString();
//Loop until the display string is equal to the word
while (!displayString.Equals(word))
{
//Inside here your logic will follow.
}
So you are basically looping until the person answers here. You could of course go back and add logic to limit the number of attempts, or whatever you desire as an alternate exit strategy.
Inside this logic, you will check if they guessed a letter or a word based on how many characters they entered.
If they guessed a word, the logic is simple. Check if the guessed word is the same as the hidden word. If it is, then you break the loop and they are done. Otherwise, guessing loops back around.
If they guessed a letter, the logic is pretty straightforward, but more involved.
First get the character they guessed, just because it may be easier to work with this way.
char guess = input[0];
Now, look over the word for instances of that character:
//Look for instances of the character in the word.
for (int i = 0; i < word.Length; ++i)
{
//If the current index in the word matches their guess, then update the display.
if (char.ToUpperInvariant(word[i]) == char.ToUpperInvariant(guess))
display[i] = word[i];
}
The comments above should explain the idea here.
Update your displayString at the bottom of the loop so that it will check against the hidden word again:
displayString = display.ToString();
That's really all you need to do here. No fancy Linq needed.
Ok your code is really confusing, even with your edit.
First, why these 2 lines of code since activeWordAlphabet is a string :
char[] activeWord = activeWordAlphabet.ToCharArray();
string activeWordString = new string(activeWord);
Then you do your foreach.
For the word "FooBar", if the player types 'F', you will print
FooBar
FooBar
FooBar
FooBar
FooBar
FooBar
How does this help you in anything?
I think you have to review your algorithm. The string type have the function you need
int chosenLetterPosition = activeWord.IndexOf(chosenLetter, alreadyFoundPosition)
alreadyFoundPosition is an int from where the function will search the letter
IndexOf() returns -1 if the letter is not find or a positive number.
You can save this position with your letter in a dictionary to use it again as your new 'alreadyFoundPosition' if the chosenLetter is already in the dictionary
This is my answer. Because I don't have a lot of tasks today :)
class Letter
{
public bool ischosen { get; set; }
public char value { get; set; }
}
class LetterList
{
public LetterList(string word)
{
_lst = new List<Letter>();
word.ToList().ForEach(x => _lst.Add(new Letter() { value = x }));
}
public bool FindLetter(char letter)
{
var search = _lst.Where(x => x.value == letter).ToList();
search.ForEach(x=>x.ischosen=true);
return search.Count > 0 ? true : false;
}
public string NotChosen()
{
var res = "";
_lst.Where(x => !x.ischosen).ToList().ForEach(x => { res += x.value; });
return res;
}
List<Letter> _lst;
}
How to use
var abc = new LetterList("abcdefghijklmnopqrstuvwxyz");
var answer = new LetterList("myanswer");
Console.WriteLine("This my question. Why? write your answer please");
char x = Console.ReadLine()[0];
if (answer.FindLetter(x))
{
Console.WriteLine("you are right!");
}
else
{
Console.WriteLine("fail");
}
abc.FindLetter(x);
Console.WriteLine("not chosen abc:{0} answer:{1}", abc.NotChosen(), answer.NotChosen());
At least we used to play this game like that when i was a child.

How to remove one char from a string?

Good day, I asked this question before but i wasn't specific for what I apologize. I'm making a simple windForms Chess Game using picture Boxes as each cell. According to the rules, the King can't move if the cell is targeted by an enemy piece. To implement this rule I'm using pictureBox.Tag property and assign a string to it. If a piece targets it I use pictureBox1.Tag += "D" D as in Danger. So if two pieces are targeting it the Tag will become "DD". My question is this - How do I remove just one 'D' from my string ? Can I use -= operator or something similar?
Assuming:
string a = "ABCDEFG";
To remove the first 'D':
a = a.Remove(a.IndexOf('D'), 1);
To remove all 'D's
a = new string(a.Where(c => c != 'D').ToArray());
Although I would recommend looking at object oriented approach. Then you could easily store references to the actual chessmen who target a spot, hence easy to modify (no need to recalculate).
While there ware ways to do what you're trying to do, what you really want to do is ditch string manipulation, and use something else. For instance, create a whole class for this square meta-information. Something like
public class SquareInfo
{
public int Danger; //the number of pieces that can move to this square.
//... Any other information about the square you want.
}
Then you could grab the tag as:
var myInfo = (SquareInfo)myBox.Tag;
if (myInfo.Danger > 2)
{
//do something
}
And so on.
I think this would solve your problem
public void Remove (PictureBox pb){
if (pb.Tag.ToString().Length > 1) {
// Greater than 1 because we need to keep one D in case of DD,
String temp = pb.Tag.ToString();
pb.Tag = temp.Substring(0, temp.Length - 2);
}
else
pb.Tag = "D";
// Tag equals D because if there is only one D, it won't be deleted
}
You can use a regex to specify to remove just one instance of a character:
Regex regex = new Regex(Regex.Escape("D"));
string output = regex.Replace("ABCDDEFG", string.Empty, 1);
Simply use the Replace method, replacing your character (as a string) with an empty string:
string s = "Daddy comes back";
s = s.Replace("D", string.Empty); // Replace ALL D'
Console.WriteLine(s);
Alternatively, if you already know the index of the characters to be removed, or if you need to remove a single character, consider using Substring methods.
s = s.Substring(i, i+1); // Remove the character at position i
Note that a C# string is immutable: in both cases, a new string instance is returned.

Parsing an inconsistent log file

I have a log file that I want to parse and load into a database. I'm struggling with the best way to go about parsing it.
The log file is in the format Category: Information
Case Number: CASE01
User ID: JOSM
Software: Microsoft Word
Date Started: 21-01-2010
Date Ended: 22-01-2010
Thing is, there's other bits and pieces thrown into the log file that mean the information isn't always present on the same line. I also only want the information, not the category.
So far, I've tried stick it all into an array separated by \r\n, but I have to know the index of the information I want in order to consistently retrieve it, and that changes. I've also tried feeding it through StreamReader and saying
if (line.Contains("Case Number"))
{
tbReport.AppendText("Case Number: " + line.Remove(0, 13) + "\r\n");
}
Which gets me the information I want, but makes it very hard to do anything with.
I feel I'm better off going down the array path, but I could do with some guidance on how to search the array for the the category, and then parse the information.
Once I can parse it accurately, adding it into a database should be fairly straight forward. As it's my first time attempting this, I'd be interested in any tips or guidance as to the best way to go about this though.
Thanks.
This will give you a collection with all key/value pairs.
List<KeyValuePair> items = new List<KeyValuePair>();
var line = reader.ReadLine();
while (line != null)
{
int pos = line.IndexOf(':');
items.Add(new KeyValuePair(line.Substring(0, pos), line.Substring(pos+1));
line = reader.ReadLine();
}
If you have a log class which contains all possible names as properties, you can use reflection instead:
class LogEntry
{
public string CaseNumber { get; set; }
public string User { get; set; }
public string Software{ get; set; }
public string DateStarted { get; set; }
public string DateEnded { get; set; }
}
List<LogEntry> items = new List<LogEntry>();
var line = reader.ReadLine();
var currentEntry = new LogEntry();
while (line != null)
{
if (line == "") //empty line = new log entry. Change to your delimiter.
{
items.Add(currentEntry);
currentEntry = new LogEntry();
}
int pos = line.IndexOf(':');
var name = line.Substring(0, pos).Replace(" ", string.Empty);
var value = line.Substring(pos+1);
var pi = entry.GetType().GetProperty(name);
pi.SetValue(entry, value, null);
line = reader.ReadLine();
}
Note that I've not tested the code (just written it directly in here). You have to add error checking and such. The last alternative is not very performant as it is, but should do OK.
Sounds like a good case candidate for RegExp :
http://www.regular-expressions.info/dotnet.html
They're not too easy to learn but once you get the basic understanding, they can't be beaten for that kind of tasks.
It's not really a simple answer, but have you maybe though about using a regular expression for parsing the information out?
Regular expressions is kinda hardcore stuff, but they can parsed advanced files quite easily.
So in what I can see, then its like:
If a line starts with A-Z, then (a-z or A-Z or 0-9 or space) from zero to many times, then followed by a : then a space, and then the value.
So if you make a regular expression for that (If you wait awhile I will try to make one for you), then you could test each line with that. If it matches, then we can also use regular expressions to take the last part out, and the "key". If it don't matches, then we just append it to the last key.
Beware that its not totally fool-proof, as a new line could just start this way, but its kinda the best thing we can do, i think.
As promised here is a starting point for your regular expression:
^(?'key'[A-Z][a-z,A-Z,0-9,\s]+):\s(?'value'.+)
So to try and tell what it does, we need to go though each part:
^ ensures that a match starts on the beginning of a line
(?'key' is a syntax to begin a "capture" group. The regular expression will then give us access to easily take the "key" part of the regular expression out.
We that with a [A-Z] - that is a group that will match any big letter. But only one
[a-z,A-Z,0-9,\s]+ - is like the previous group, but just for all big, or small letters, numbers and space (\s), the plus outside the group tells that it can match more than one.
Then we just end the group, and puts in out *: and then a space.
We then begin a new group the value group, just like the key group.
Then we just write . (that means everything), and then just a + after that to make it catch more than one
I actually think that you can just take the whole string, and just match a:
RegEx.Matches (or something like that), and loop over them.
Then just take match.Groups["key"] and match.Groups["value"] and put into your array. (Sorry i dont have a Visual Studio handy to test it out)

Use and parse a text file in C# to initialize a component based game model

I have a text file that should initialize my objects, which are built around a component based model, it is my first time trying to use a data driven approach and i'm not sure if i am heading in the right direction here.
The file i have currently in mind looks like this
EliteGoblin.txt
#Goblin.txt
[general]
hp += 20
strength = 12
description = "A big menacing goblin"
tacticModifier += 1.3
[skills]
fireball
Where the # symbol says which other files to parse at at that point
The names in [] correspond with component classes in the code
And below them is how to configure them
For example the hp += 20 would increase the value taken from goblin.txt and increase it by 20 etc.
My question is how i should go about parsing this file, is there some sort of parser built in C#?
Could i change the format of my document to match already defined format that already has support in .net?
How do i go about understanding what type is each value? int/float/string
Does this seem a viable solution at all?
Thanks in advance, Xtapodi.
Drop the flat file and pick up XML. Definately look into XML Serialization. You can simply create all of your objects in C# as classes, serialize them into XML, and reload them into your application without having to worry about parsing a flat file out. Because your objects will act as the schema for your XML, you won't have to worry about casting objects and writing a huge parsing routine, .NET will handle it for you. You will save many moons of headache.
For instance, you could rewrite your class to look like this:
public class Monster
{
public GeneralInfo General {get; set;}
public SkillsInfo Skills {get; set;}
}
public class GeneralInfo
{
public int Hp {get; set;}
public string Description {get; set;}
public double TacticModifier {get; set;}
}
public class SkillsInfo
{
public string[] SkillTypes {get; set;}
}
...and your XML would get deserialized to something like...
<Monster>
<General>
<Hp>20</Hp>
<Description>A big menacing goblin</Description>
<TacticModifier>1.3</TacticModifier>
</General>
<SkillTypes>
<SkillType>Fireball</SkillType>
<SkillType>Water</SkillType>
</SkillTypes>
</Monster>
..Some of my class names, hierarchy, etc. may be wrong, as I punched this in real quick, but you get the general gist of how serialization will work.
You might want to check out Sprache, a .net library that can create DSL' s by Autofac creator Nicholas Blumhardt. From the google site:
Sprache is a small library for
constructing parsers directly in C#
code.
It isn't an "industrial strength"
framework - it fits somewhere in
between regular expressions and a
full-blown toolset like ANTLR.
Usage Unlike most parser-building
frameworks, you use Sprache directly
from your program code, and don't need
to set up any build-time code
generation tasks. Sprache itself is a
single tiny assembly.
A simple parser might parse a sequence
of characters:
// Parse any number of capital 'A's in
a row var parseA =
Parse.Char('A').AtLeastOnce(); Sprache
provides a number of built-in
functions that can make bigger parsers
from smaller ones, often callable via
Linq query comprehensions:
Parser identifier =
from leading in Parse.Whitespace.Many()
from first in Parse.Letter.Once()
from rest in Parse.LetterOrDigit.Many()
from trailing in Parse.Whitespace.Many()
select new string(first.Concat(rest).ToArray());
var id = identifier.Parse(" abc123
");
Assert.AreEqual("abc123", id);
The link to the article builds a questionaire that is driven by a simple text file with the following format:
identification "Personal Details"
[
name "Full Name"
department "Department"
]
employment "Current Employer"
[
name "Your Employer"
contact "Contact Number"
#months "Total Months Employed"
]
There is no builtin function to do exactly what you want to do, but you can easily write it.
void ParseLine(string charClass, string line) {
// your code to parse line here...
Console.WriteLine("{0} : {1}", charClass, line);
}
void ParseFile(string fileName) {
string currentClass = "";
using (StringReader sr = new StringReader(fileName)) {
string line = sr.ReadLine();
if (line[0] == '#') {
string embeddedFile = line.Substring(1);
ParseFile(embeddedFile);
}
else if (line[0] == '[') {
currentClass = line.Substring(2, line.Length - 2);
}
else ParseLine(currentClass, line);
}
}
What you want to do isn't going to be easy. The statistic inheritance in particular.
So, unless you can find some existing code to leverage, I suggest you start with simpler requirements with a view to adding the more involved functionality later and build up the functionality incrementally.

Categories