The goal is to have a string input (coming from the frontend), and this string should be transformed to act as a escaped char in the backend.
In the following example I want the user to write "\" + "t", and the backend should interpret it as "\t" (= tab char):
var inputStr = #"\t"; // The input is a string written by a user: "\t" (backslash char + t char == #"\t" != "\t")
var outputStr = SomeOperation(inputStr); // ???
Console.WriteLine("A" + outputStr + "B <= should be tab separated");
I have tried:
var outputStr = inputStr.Replace("\", "");
This isn't something that is built in. Ultimately, "\t" == (a string of length 1 containing a tab character) is implemented by the C# compiler, not the runtime. There isn't a pre-existing implementation of this in the runtime, in part because each language (VB.NET, C#, F#, etc) can have their own rules.
You would need to write your own implementation with your own definitions of escape characters. Fortunately, it is mostly an exercise in .Replace(...). There are some edge cases to think about - in particular for ordering - though; for example, if \\ becomes \ and \n becomes newline; does \\n become \n? or does it become \(newline)? done naively, it can end up as just (newline) - i.e. foo.Replace(#"\\",#"\").Replace(#"\n","\n")
You can do something like this:
void Main()
{
Debug.Assert(ReplaceChar("hello\tworld", #"\t") == "helloworld"); // passed
}
string ReplaceChar(string str, string userInput)
{
switch (userInput)
{
case #"\t":
return str.Replace("\t","");
}
return str;
}
I have finally found an easy way to do it:
Regex.Unescape(inputStr);
See the documentation of the Regex.Unescape function for more details.
Example:
var ouptutStr = Regex.Unescape("\\t");
// ✓ Result: outputStr == "\t"
var outputStr = Char.Parse("\t").ToString();
gives
A B <= should be tab separated
It isn't seen here but in console it looks properly.
Related
I am making a simple compiler, and am working on string parsing. At the moment, my code is:
while (stringToParse.Contains(" + ") || stringToParse.Contains("+ ") || stringToParse.Contains(" +")) {
stringToParse = stringToParse.Replace(" +", "+").Replace("+ ", "+").Replace(" + ", "+");
}
string[] splitString = stringToParse.Split("+");
But something like:
"\"hello \" + \"world \" + \" + \" + \"hello\""
Would return:
["\"hello "\", "\"world \"", "\"", "\"", ]
(without backslashes)
But something like:
""hello " + "world " + " + " + "hello""
Would return:
[""hello "", ""world "", """, """, ]
So how can I specify if a " + " is in a string or as a separator? is there maybe a way to detect for something like the following?
...(any number of non " or + characters)...+...(any number of " or + characters)
My expected output would be:
[""hello "", ""world "", ""+""]
Explicit State Machine
To do this, Without using any dedicated library, I suggest to build a state machine.
You will iterate over the characters of the string, and depending on which character you encounter you update the state of the machine. Optimizations are possible, however, let us begin with conventional clarity.
var characters = input.ToCharArray();
var results = new List<string>();
var current = string.Empty;
// 0 = not inside quotes, we expect +
// 1 = not inside quotes, we expect "
// 2 = inside quotes
var state = 1;
foreach (var character in characters)
{
switch (state)
{
case 0:
// We are not inside quotes, we expect +
if (character == '+')
{
state = 1;
continue;
}
if (char.IsWhiteSpace(character))
{
continue;
}
// error?
break;
case 1:
// We are not inside quotes, we expect "
if (character == '\"')
{
state = 2;
continue;
}
if (char.IsWhiteSpace(character))
{
continue;
}
// error?
break;
case 2:
// We are inside quotes, we expect "
if (character == '\"')
{
state = 0;
results.Add(current);
current = string.Empty;
continue;
}
current += character;
break;
default:
// error?
break;
}
}
if (state != 0)
{
// error
}
// You can use results.ToArray();
Possible optimizations:
We can use a StringBuilder instead of concatenations.
Also, we can use IndexOf to find the next relevant character.
We can check if a string (a chunk of characters) is empty or white space (perhaps using IsNullOrWhiteSpace).
We can use AsSpan so we can work with ReadOnlySpan instead.
You can also see how you can add support for your own escape sequences, or any other stuff.
Implicit State Machine (with helper class)
I want to point out that this is not the only way to organize this code. I would, if I were you, create a pseudo iterator class that had a method two methods:
A method that returns the next character... or better yet, that returns true if the next character matches a parameter (and advances), or false (and does not advance).
A method that returns all the characters until the next instance of a particular character (and advances to there).
The main advantage of such approach is that I would no longer have to step character by character, thus, I would not need to have a state variable. Instead I could allow the code structure to resemble the shape of my gramar.
Wait, I have wrote such class: StringProcessor. It is part of the Theraot.Core nuget, it is used to parse strings to BigInteger.
var processor = new Theraot.Core.StringProcessor(input);
var results = new List<string>();
while (!processor.EndOfString)
{
// SkipWhile skips all the characters that match
processor.SkipWhile(char.IsWhiteSpace);
// Read returns true (and advances after) if what is next matches the paramter
if (processor.Read('"'))
{
// ReadUntil advances after and returns everything found before the parameter
// Note: it does not advance after the parameter.
results.Add(processor.ReadUntil('"'));
processor.Read('"');
}
processor.SkipWhile(char.IsWhiteSpace);
if (!processor.Read('+'))
{
// error?
}
}
Please notice that a class such as the StringProcessor used above cuts a lot of fluff, which makes it viable for simple languages.
Custom Tokenizer
Of course, for something more complex you might want to look for a tokenizer.
To give you an example, consider that this is the "grammar" we have:
Document: Many
{
Whitespace
String:
{
QuoteSymbol
NonQuoteSymbol
QuoteSymbol
}
Whitespace
PlusSymbol
}
No, this not any of the usual metalanguages. However, written this way it is easier to see how the code we had above resembles the language.
Would it not be nice to write as follows?
var QuoteSymbol = Pattern.Literal("QuoteSymbol", '"');
var NonQuoteSymbol = Pattern.Custom("NonQuoteSymbol", s => s.ReadUntil('"'));
var String = Pattern.Conjunction("String", QuoteSymbol, NonQuoteSymbol, QuoteSymbol);
var WhiteSpace = Pattern.Custom("WhiteSpace", s => s.ReadWhile(char.IsWhiteSpace));
var PlusSymbol = Pattern.Literal("PlusSymbol", '+');
var Document = Pattern.Repetition(
Pattern.Conjunction(WhiteSpace, String, WhiteSpace, PlusSymbol)
);
var results = from TerminalSymbol symbol
in Document.Parse(input)
where symbol.Pattern == String
select symbol.ToString();
Writing code like that would make it easier to modify the language. Well, we are still writing code, however you could imagine parsing a file that has the grammar of the language you want to parse... Fancy!
As you might expect, it requires extra work to build the necesary code to make it work. Or, you know, get some code that already works (the linked code is built around on StringProcessor).
Language Toolkits
The code presented earlier is not suitable to be used for a prettyprinter and is not capable of recovering from a syntax error. It can be modified to do such things. Neither will it integrate with code editors at any level.
If you want a fully fledged solution. I have two suggestions:
Irony
Nitra
These are the kind of things you would use if you wanted to create a programming language ontop.
And of course, I should link you to "Compilers: Principles, Techniques, and Tools" usually just known as "The Dragon Book".
How to get whole text from document contacted into the string. I'm trying to split text by dot: string[] words = s.Split('.'); I want take this text from text document. But if my text document contains empty lines between strings, for example:
pat said, “i’ll keep this ring.”
she displayed the silver and jade wedding ring which, in another time track,
she and joe had picked out; this
much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
result looks like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track
3. she and joe had picked out this
4. much of the alternate world she had elected to retain
5. he wondered what if any legal basis she had kept in addition
6. none he hoped wisely however he said nothing
7. better not even to ask
but desired correct output should be like this:
1. pat said ill keep this ring
2. she displayed the silver and jade wedding ring which in another time track she and joe had picked out this much of the alternate world she had elected to retain
3. he wondered what if any legal basis she had kept in addition
4. none he hoped wisely however he said nothing
5. better not even to ask
So to do this first I need to process text file content to get whole text as single string, like this:
pat said, “i’ll keep this ring.” she displayed the silver and jade wedding ring which, in another time track, she and joe had picked out; this much of the alternate world she had elected to retain. he wondered what - if any - legal basis she had kept in addition. none, he hoped; wisely, however, he said nothing. better not even to ask.
I can't to do this same way as it would be with list content for example: string concat = String.Join(" ", text.ToArray());,
I'm not sure how to contact text into string from text document
I think this is what you want:
var fileLocation = #"c:\\myfile.txt";
var stringFromFile = File.ReadAllText(fileLocation);
//replace Environment.NewLine with any new line character your file uses
var withoutNewLines = stringFromFile.Replace(Environment.NewLine, "");
//modify to remove any unwanted character
var withoutUglyCharacters = Regex.Replace(withoutNewLines, "[“’”,;-]", "");
var withoutTwoSpaces = withoutUglyCharacters.Replace(" ", " ");
var result = withoutTwoSpaces.Split('.').Where(i => i != "").Select(i => i.TrimStart()).ToList();
So first you read all text from your file, then you remove all unwanted characters and then split by . and return non empty items
Have you tried replacing double new-lines before splitting using a period?
static string[] GetSentences(string filePath) {
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
var sentences = Regex.Split(lines, #"\.[\s]{1,}?");
return sentences;
}
I haven't tested this, but it should work.
Explanation:
if (!File.Exists(filePath))
throw new FileNotFoundException($"Could not find file { filePath }!");
Throws an exception if the file could not be found. It is advisory you surround the method call with a try/catch.
var lines = string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line)));
Creates a string, and ignores any lines which are purely whitespace or empty.
var sentences = Regex.Split(lines, #".[\s]{1,}?");
Creates a string array, where the string is split at every period and whitespace following the period.
E.g:
The string "I came. I saw. I conquered" would become
I came
I saw
I conquered
Update:
Here's the method as a one-liner, if that's your style?
static string[] SplitSentences(string filePath) => File.Exists(filePath) ? Regex.Split(string.Join("", File.ReadLines(filePath).Where(line => !string.IsNullOrEmpty(line) && !string.IsNullOrWhiteSpace(line))), #"") : null;
I would suggest you to iterate through all characters and just check if they are in range of 'a' >= char <= 'z' or if char == ' '. If it matches the condition then add it to the newly created string else check if it is '.' character and if it is then end your line and add another one :
List<string> lines = new List<string>();
string line = string.Empty;
foreach(char c in str)
{
if((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20)
line += c;
else if(c == '.')
{
lines.Add(line.Trim());
line = string.Empty;
}
}
Working online example
Or if you prefer "one-liner"s :
IEnumerable<string> lines = new string(str.Select(c => (char)(((char.ToLower(c) >= 'a' && char.ToLower(c) <= 'z') || c == 0x20) ? c : c == '.' ? '\n' : '\0')).ToArray()).Split('\n').Select(s => s.Trim());
I may be wrong about this. I would think that you may not want to alter the string if you are splitting it. Example, there are double/single quote(s) (“) in part of the string. Removing them may not be desired which brings up the possibly of a question, reading a text file that contains single/double quotes (as your example data text shows) like below:
var stringFromFile = File.ReadAllText(fileLocation);
will not display those characters properly in a text box or the console because the default encoding using the ReadAllText method is UTF8. Example the single/double quotes will display (replacement characters) as diamonds in a text box on a form and will be displayed as a question mark (?) when displayed to the console. To keep the single/double quotes and have them display properly you can get the encoding for the OS’s current ANSI encoding by adding a parameter to the ReadAllText method like below:
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
Below is code using a simple split method to .split the string on periods (.) Hope this helps.
private void button1_Click(object sender, EventArgs e) {
string fileLocation = #"C:\YourPath\YourFile.txt";
string stringFromFile = File.ReadAllText(fileLocation, ASCIIEncoding.Default);
string bigString = stringFromFile.Replace(Environment.NewLine, "");
string[] result = bigString.Split('.');
int count = 1;
foreach (string s in result) {
if (s != "") {
textBox1.Text += count + ". " + s.Trim() + Environment.NewLine;
Console.WriteLine(count + ". " + s.Trim());
count++;
}
else {
// period at the end of the string
}
}
}
I'm woefully attempting a programming assignment. I'm not looking for a "this is how you do this" but more of a "what am I doing wrong?"
I'm attempting to capitalize the start of each sentence from a string input. So for example the string "Hello. my name is john. i like to ride bikes." I would modify the string and return it with capitals for example: "Hello. My name is john. I like to ride bikes." My logic seems a bit flawed and I'm very lost.
What I have so far below. Basically all I'm doing is testing for a punctuation signifying the end of a sentence. And then trying to replace the character. Also testing if it's the at the end of the string as to not create IndexOutOfRange exceptions. Although, that's all I've been getting :(
private string SentenceCapitalizer(string input)
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '.' || input[i] == '!' || input[i] == '?')
{
if (!(input[i] == input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
}
}
return input;
}
Any help is greatly appreciated. I'm just learning C# so the most basic of help would be of service. I don't know much :P
Instead of
if (!(input[i + 2] >= input.Length))
It should be
if (!(i + 2 >= input.Length))
You are comparing indices, not characters
You are checking if your current index is less than or equal to the length of the string and then attempting to alter an index 2 further along
if (!(input[i] == input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
Should be changed to
if (!((i + 2) >= input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
This will check that there is a value 2 places after a punctuation mark. Also make use of >= rather than == since you're jumping 2 you might end up going over the length of the array where == still returns false but there is no index.
Strings are immutable, you can't do:
var str = "123";
str.Replace('1', '2');
You have to do:
var str = "123";
str = str.Replace('1', '2');
Ok, others have provided you with some pointers to stop the obvious errors, but I'll try to give you some thoughts on how to best implement this.
It is worth thinking about this as a 3-step process
Tokenize the string into sentences
Ensure that the first character of each token is uppercase
reconstruct the string by joining the tokens back together
(1) I'll leave to your imagination, but the idea is to end up with an array of strings with each element representing a "sentence" according to your requirement
(2) Is pretty much as simple as
// Upercase character 0, and join it to everything from character 1 onwards
var fixedToken = token[0].ToUpper(CultureInfo.CurrentCulture)
+ token.Substring(1);
(3) Is also simple
// reconstruct string by joining all tokens with a space
var reconstructed = String.Join(" ",tokens);
I would like to check some string for invalid characters. With invalid characters I mean characters that should not be there. What characters are these? This is different, but I think thats not that importan, important is how should I do that and what is the easiest and best way (performance) to do that?
Let say I just want strings that contains 'A-Z', 'empty', '.', '$', '0-9'
So if i have a string like "HELLO STaCKOVERFLOW" => invalid, because of the 'a'.
Ok now how to do that? I could make a List<char> and put every char in it that is not allowed and check the string with this list. Maybe not a good idea, because there a lot of chars then. But I could make a list that contains all of the allowed chars right? And then? For every char in the string I have to compare the List<char>? Any smart code for this? And another question: if I would add A-Z to the List<char> I have to add 25 chars manually, but these chars are as I know 65-90 in the ASCII Table, can I add them easier? Any suggestions? Thank you
You can use a regular expression for this:
Regex r = new Regex("[^A-Z0-9.$ ]$");
if (r.IsMatch(SomeString)) {
// validation failed
}
To create a list of characters from A-Z or 0-9 you would use a simple loop:
for (char c = 'A'; c <= 'Z'; c++) {
// c or c.ToString() depending on what you need
}
But you don't need that with the Regex - pretty much every regex engine understands the range syntax (A-Z).
I have only just written such a function, and an extended version to restrict the first and last characters when needed. The original function merely checks whether or not the string consists of valid characters only, the extended function adds two integers for the numbers of valid characters at the beginning of the list to be skipped when checking the first and last characters, in practice it simply calls the original function 3 times, in the example below it ensures that the string begins with a letter and doesn't end with an underscore.
StrChr(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
StrChrEx(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", 11, 1));
BOOL __cdecl StrChr(CHAR* str, CHAR* chars)
{
for (int s = 0; str[s] != 0; s++)
{
int c = 0;
while (true)
{
if (chars[c] == 0)
{
return false;
}
else if (str[s] == chars[c])
{
break;
}
else
{
c++;
}
}
}
return true;
}
BOOL __cdecl StrChrEx(CHAR* str, CHAR* chars, UINT excl_first, UINT excl_last)
{
char first[2] = {str[0], 0};
char last[2] = {str[strlen(str) - 1], 0};
if (!StrChr(str, chars))
{
return false;
}
if (excl_first != 0)
{
if (!StrChr(first, chars + excl_first))
{
return false;
}
}
if (excl_last != 0)
{
if (!StrChr(last, chars + excl_last))
{
return false;
}
}
return true;
}
If you are using c#, you do this easily using List and contains. You can do this with single characters (in a string) or a multicharacter string just the same
var pn = "The String To ChecK";
var badStrings = new List<string>()
{
" ","\t","\n","\r"
};
foreach(var badString in badStrings)
{
if(pn.Contains(badString))
{
//Do something
}
}
If you're not super good with regular expressions, then there is another way to go about this in C#. Here is a block of code I wrote to test a string variable named notifName:
var alphabet = "a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z";
var numbers = "0,1,2,3,4,5,6,7,8,9";
var specialChars = " ,(,),_,[,],!,*,-,.,+,-";
var validChars = (alphabet + "," + alphabet.ToUpper() + "," + numbers + "," + specialChars).Split(',');
for (int i = 0; i < notifName.Length; i++)
{
if (Array.IndexOf(validChars, notifName[i].ToString()) < 0) {
errorFound = $"Invalid character '{notifName[i]}' found in notification name.";
break;
}
}
You can change the characters added to the array as needed. The Array IndexOf method is the key to the whole thing. Of course if you want commas to be valid, then you would need to choose a different split character.
Not enough reps to comment directly, but I recommend the Regex approach. One small caveat: you probably need to anchor both ends of the input string, and you will want at least one character to match. So (with thanks to ThiefMaster), here's my regex to validate user input for a simple arithmetical calculator (plus, minus, multiply, divide):
Regex r = new Regex(#"^[0-9\.\-\+\*\/ ]+$");
I'd go with a regex, but still need to add my 2 cents here, because all the proposed non-regex solutions are O(MN) in the worst case (string is valid) which I find repulsive for religious reasons.
Even more so when LINQ offers a simpler and more efficient solution than nesting loops:
var isInvalid = "The String To Test".Intersect("ALL_INVALID_CHARS").Any();
This question is for C Sharp (and Java maybe :).
When I want to display a message to the console, I want to insert after each "+" a blank space. How can I do this, without inserting manually that blank space?
try this
var text = string.Join(" ", new[] {foo, bar, other });
You can't, really - just put it in explicitly:
Console.WriteLine(foo + " " + bar);
or
System.out.println(foo + " " + bar);
I mean you could write a method with a parameter array / varargs parameter, e.g. (C#)
public void WriteToConsole(params object[] values)
{
string separator = "";
foreach (object value in values)
{
Console.Write(separator);
separator = " ";
Console.Write(value);
}
}
... but personally I wouldn't.
if you're looking for a way to tidy your printing routine try String.Format e.g.
Console.WriteLine(String.Format("{0} {1}", string1, string2));
In C#:
string.Join(" ", "Foo", "Bar", "Baz");
In Java:
String.join(" ", "Foo", "Bar", "Baz");
Each of these methods permits a variable number of strings to join, and each has various overloads to pass in collections of strings too.
You can replace "+" with "+ ". Something like this:
new String("Foo+Bar").replace("+", "+ ");
Do you mean a concatenation of strings or just a '+' character? In Java, if there are lot of parameters to show within an output string you can use String.format method like this: String.format("First: %s, second: %s, third: %s etc", param1, param2, param3). In my opinion it's more readable than chained concatenation with '+' operator.
In C# you can use String Interpolation as well using the $ special character, which identifies a string literal as an interpolated string
string text1 = "Hello";
string text2 = "World!";
Console.WriteLine($"{text1}, {text2}");
Output
Hello, World!
From Docs
String interpolation provides a more readable and convenient syntax to create formatted strings than a string composite formatting feature.
// Composite formatting:
Console.WriteLine("Hello, {0}! Today is {1}, it's {2:HH:mm} now.", name, date.DayOfWeek, date);
// String interpolation:
Console.WriteLine($"Hello, {name}! Today is {date.DayOfWeek}, it's {date:HH:mm} now.");
Both calls produce the same output that is similar to:
Hello, Mark! Today is Wednesday, it's 19:40 now.
Or in Java you can use the printf variant of System.out:
System.out.printf("%s %s", foo, bar);
Remember to put in a "\n" [line feed] at the end, if there are multiple lines to print.