Is there a ReadWord() method in the .NET Framework? - c#

I'd hate to reinvent something that was already written, so I'm wondering if there is a ReadWord() function somewhere in the .NET Framework that extracts words based some text delimited by white space and line breaks.
If not, do you have a implementation that you'd like to share?
string data = "Four score and seven years ago";
List<string> words = new List<string>();
WordReader reader = new WordReader(data);
while (true)
{
string word =reader.ReadWord();
if (string.IsNullOrEmpty(word)) return;
//additional parsing logic goes here
words.Add(word);
}

Not that I'm aware of directly. If you don't mind getting them all in one go, you could use a regular expression:
Regex wordSplitter = new Regex(#"\W+");
string[] words = wordSplitter.Split(data);
If you have leading/trailing whitespace you'll get an empty string at the beginning or end, but you could always call Trim first.
A different option is to write a method which reads a word based on a TextReader. It could even be an extension method if you're using .NET 3.5. Sample implementation:
using System;
using System.IO;
using System.Text;
public static class Extensions
{
public static string ReadWord(this TextReader reader)
{
StringBuilder builder = new StringBuilder();
int c;
// Ignore any trailing whitespace from previous reads
while ((c = reader.Read()) != -1)
{
if (!char.IsWhiteSpace((char) c))
{
break;
}
}
// Finished?
if (c == -1)
{
return null;
}
builder.Append((char) c);
while ((c = reader.Read()) != -1)
{
if (char.IsWhiteSpace((char) c))
{
break;
}
builder.Append((char) c);
}
return builder.ToString();
}
}
public class Test
{
static void Main()
{
// Give it a few challenges :)
string data = #"Four score and
seven years ago ";
using (TextReader reader = new StringReader(data))
{
string word;
while ((word = reader.ReadWord()) != null)
{
Console.WriteLine("'{0}'", word);
}
}
}
}
Output:
'Four'
'score'
'and'
'seven'
'years'
'ago'

Not as such, however you could use String.Split to split the string into an array of string based on a delimiting character or string. You can also specify multiple strings / characters for the split.
If you'd prefer to do it without loading everything into memory then you could write your own stream class that does it as it reads from a stream but the above is a quick fix for small amounts of data word splitting.

Related

getting CS0029 error when using StringBuilder

I'm trying to refresh my knowledge regarding c# and came accross this problem,
Have the function StringChallenge(str) take the str parameter being passed and return a compressed version of the string using the Run-length encoding algorithm. This algorithm works by taking the occurrence of each repeating character and outputting that number along with a single character of the repeating sequence. For example: "wwwggopp" would return 3w2g1o2p. The string will not contain any numbers, punctuation, or symbols.
and my code is
using System;
using System.Text;
class MainClass {
public static string StringChallenge(string str) {
// code goes here
var newString = new StringBuilder();
var result = new StringBuilder();
foreach (var c in str){
if (newString.Length == 0 || newString[newString.Length - 1] == c){
newString.Append(c);
}
else{
result.Append($"{newString.Length}{newString[0]}");
newString.Clear();
newString.Append(c);
}
}
if (newString.Length > 0){
result.Append($"{newString.Length}{newString[0]}");
}
return result;
}
static void Main() {
// keep this function call here
Console.WriteLine(StringChallenge(Console.ReadLine()));
}
}
please help. thank you!

C# Split a string and build a stringarray out of the string [duplicate]

I need to split a string into newlines in .NET and the only way I know of to split strings is with the Split method. However that will not allow me to (easily) split on a newline, so what is the best way to do it?
To split on a string you need to use the overload that takes an array of strings:
string[] lines = theText.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None
);
Edit:
If you want to handle different types of line breaks in a text, you can use the ability to match more than one string. This will correctly split on either type of line break, and preserve empty lines and spacing in the text:
string[] lines = theText.Split(
new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None
);
What about using a StringReader?
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}
Try to avoid using string.Split for a general solution, because you'll use more memory everywhere you use the function -- the original string, and the split copy, both in memory. Trust me that this can be one hell of a problem when you start to scale -- run a 32-bit batch-processing app processing 100MB documents, and you'll crap out at eight concurrent threads. Not that I've been there before...
Instead, use an iterator like this;
public static IEnumerable<string> SplitToLines(this string input)
{
if (input == null)
{
yield break;
}
using (System.IO.StringReader reader = new System.IO.StringReader(input))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
This will allow you to do a more memory efficient loop around your data;
foreach(var line in document.SplitToLines())
{
// one line at a time...
}
Of course, if you want it all in memory, you can do this;
var allTheLines = document.SplitToLines().ToArray();
You should be able to split your string pretty easily, like so:
aString.Split(Environment.NewLine.ToCharArray());
Based on Guffa's answer, in an extension class, use:
public static string[] Lines(this string source) {
return source.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
}
Regex is also an option:
private string[] SplitStringByLineFeed(string inpString)
{
string[] locResult = Regex.Split(inpString, "[\r\n]+");
return locResult;
}
For a string variable s:
s.Split(new string[]{Environment.NewLine},StringSplitOptions.None)
This uses your environment's definition of line endings. On Windows, line endings are CR-LF (carriage return, line feed) or in C#'s escape characters \r\n.
This is a reliable solution, because if you recombine the lines with String.Join, this equals your original string:
var lines = s.Split(new string[]{Environment.NewLine},StringSplitOptions.None);
var reconstituted = String.Join(Environment.NewLine,lines);
Debug.Assert(s==reconstituted);
What not to do:
Use StringSplitOptions.RemoveEmptyEntries, because this will break markup such as Markdown where empty lines have syntactic purpose.
Split on separator new char[]{Environment.NewLine}, because on Windows this will create one empty string element for each new line.
I just thought I would add my two-bits, because the other solutions on this question do not fall into the reusable code classification and are not convenient.
The following block of code extends the string object so that it is available as a natural method when working with strings.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
using System.Collections.ObjectModel;
namespace System
{
public static class StringExtensions
{
public static string[] Split(this string s, string delimiter, StringSplitOptions options = StringSplitOptions.None)
{
return s.Split(new string[] { delimiter }, options);
}
}
}
You can now use the .Split() function from any string as follows:
string[] result;
// Pass a string, and the delimiter
result = string.Split("My simple string", " ");
// Split an existing string by delimiter only
string foo = "my - string - i - want - split";
result = foo.Split("-");
// You can even pass the split options parameter. When omitted it is
// set to StringSplitOptions.None
result = foo.Split("-", StringSplitOptions.RemoveEmptyEntries);
To split on a newline character, simply pass "\n" or "\r\n" as the delimiter parameter.
Comment: It would be nice if Microsoft implemented this overload.
Starting with .NET 6 we can use the new String.ReplaceLineEndings() method to canonicalize cross-platform line endings, so these days I find this to be the simplest way:
var lines = input
.ReplaceLineEndings()
.Split(Environment.NewLine, StringSplitOptions.None);
I'm currently using this function (based on other answers) in VB.NET:
Private Shared Function SplitLines(text As String) As String()
Return text.Split({Environment.NewLine, vbCrLf, vbLf}, StringSplitOptions.None)
End Function
It tries to split on the platform-local newline first, and then falls back to each possible newline.
I've only needed this inside one class so far. If that changes, I will probably make this Public and move it to a utility class, and maybe even make it an extension method.
Here's how to join the lines back up, for good measure:
Private Shared Function JoinLines(lines As IEnumerable(Of String)) As String
Return String.Join(Environment.NewLine, lines)
End Function
Well, actually split should do:
//Constructing string...
StringBuilder sb = new StringBuilder();
sb.AppendLine("first line");
sb.AppendLine("second line");
sb.AppendLine("third line");
string s = sb.ToString();
Console.WriteLine(s);
//Splitting multiline string into separate lines
string[] splitted = s.Split(new string[] {System.Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
// Output (separate lines)
for( int i = 0; i < splitted.Count(); i++ )
{
Console.WriteLine("{0}: {1}", i, splitted[i]);
}
string[] lines = text.Split(
Environment.NewLine.ToCharArray(),
StringSplitOptions.RemoveEmptyStrings);
The RemoveEmptyStrings option will make sure you don't have empty entries due to \n following a \r
(Edit to reflect comments:) Note that it will also discard genuine empty lines in the text. This is usually what I want but it might not be your requirement.
I did not know about Environment.Newline, but I guess this is a very good solution.
My try would have been:
string str = "Test Me\r\nTest Me\nTest Me";
var splitted = str.Split('\n').Select(s => s.Trim()).ToArray();
The additional .Trim removes any \r or \n that might be still present (e. g. when on windows but splitting a string with os x newline characters). Probably not the fastest method though.
EDIT:
As the comments correctly pointed out, this also removes any whitespace at the start of the line or before the new line feed. If you need to preserve that whitespace, use one of the other options.
Examples here are great and helped me with a current "challenge" to split RSA-keys to be presented in a more readable way. Based on Steve Coopers solution:
string Splitstring(string txt, int n = 120, string AddBefore = "", string AddAfterExtra = "")
{
//Spit each string into a n-line length list of strings
var Lines = Enumerable.Range(0, txt.Length / n).Select(i => txt.Substring(i * n, n)).ToList();
//Check if there are any characters left after split, if so add the rest
if(txt.Length > ((txt.Length / n)*n) )
Lines.Add(txt.Substring((txt.Length/n)*n));
//Create return text, with extras
string txtReturn = "";
foreach (string Line in Lines)
txtReturn += AddBefore + Line + AddAfterExtra + Environment.NewLine;
return txtReturn;
}
Presenting a RSA-key with 33 chars width and quotes are then simply
Console.WriteLine(Splitstring(RSAPubKey, 33, "\"", "\""));
Output:
Hopefully someone find it usefull...
Silly answer: write to a temporary file so you can use the venerable
File.ReadLines
var s = "Hello\r\nWorld";
var path = Path.GetTempFileName();
using (var writer = new StreamWriter(path))
{
writer.Write(s);
}
var lines = File.ReadLines(path);
using System.IO;
string textToSplit;
if (textToSplit != null)
{
List<string> lines = new List<string>();
using (StringReader reader = new StringReader(textToSplit))
{
for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
{
lines.Add(line);
}
}
}
Very easy, actually.
VB.NET:
Private Function SplitOnNewLine(input as String) As String
Return input.Split(Environment.NewLine)
End Function
C#:
string splitOnNewLine(string input)
{
return input.split(environment.newline);
}

Find string in txt file using a list c# [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am trying to find out if a .txt file contains words stored in a list named Abreviated. This list is filled by reading values from a csv file as shown below;
StreamReader sr = new StreamReader(#"C:\textwords.csv");
string TxtWrd = sr.ReadLine();
while ((TxtWrd = sr.ReadLine()) != null)
{
Words = TxtWrd.Split(Seperators, StringSplitOptions.None);
Abreviated.Add(Words[0]);
Expanded.Add(Words[1]);
}
I would like to use this list to check if a .txt file contains any of the words in the list. The .txt file is being read using a streamreader and is stored as a string FileContent. the code i have to try and find the matches is below;
if (FC.Contains(Abreviated.ToString()))
{
MessageBox.Show("Match found");
}
else
{
MessageBox.Show("No Match");
}
This will always return the else statement even though one of the words is in the text file.
any advice on how to get this working?
Thanks in advance!
You can use key-value pair data structure for storing abbreviated word and respective full word as key-value pair. In C#, Dictionary has generic implementation for storing key value pair.
I've refactored your code which makes easy to reuse.
internal class FileParser
{
internal Dictionary<string, string> WordDictionary = new Dictionary<string, string>();
private string _filePath;
private char Seperators => ',';
internal FileParser(string filePath)
{
_filePath = filePath;
}
internal void Parse()
{
StreamReader sr = new StreamReader(_filePath);
string TxtWrd = sr.ReadLine();
while ((TxtWrd = sr.ReadLine()) != null)
{
var words = TxtWrd.Split(Seperators, StringSplitOptions.None);
//WordDictionary.TryAdd(Words[0], Words[1]); // available in .NET corefx https://github.com/dotnet/corefx/issues/1942
if (!WordDictionary.ContainsKey(words[0]))
WordDictionary.Add(words[0], words[1]);
}
}
internal bool IsWordAvailable(string word)
{
return WordDictionary.ContainsKey(word);
}
}
Now, you can reuse above class within your assembly like in following way :
public class Program
{
public static void Main(string[] args)
{
var fileParser = new FileParser(#"C:\textwords.csv");
if(fileParser.IsWordAvailable("abc"))
{
MessageBox.Show("Match found");
}
else
{
MessageBox.Show("No Match");
}
}
}
You are comparing your entire file's content to the string representation of a collections of words. You need to compare each individual word found in the file content to your abbreviated list. One way you could do the comparison is to split the file content into individual words and then look those up individually against your abbreviated list.
string[] fileWords = FC.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
bool hasMatch = false;
for(string fileWord : fileWords)
{
if(Abbreviated.Contains(fileWord))
{
hasMatch = true;
break;
}
}
if (hasMatch)
{
MessageBox.Show("Match found");
}
else
{
MessageBox.Show("No Match");
}
I would recommend switching your abbreviated collection to a HashSet or a Dictionary that also includes your matching expanded text for the abbreviation. Also, there are probably alternate ways to do the search you are looking for with regex.
I'm unsure on what some of your variables are so this may be slightly different to what you have, but gives the same functionality.
static void Main(string[] args)
{
List<string> abbreviated = new List<string>();
List<string> expanded = new List<string>();
StreamReader sr = new StreamReader("textwords.csv");
string TxtWrd = "";
while ((TxtWrd = sr.ReadLine()) != null)
{
Debug.WriteLine("line: " + TxtWrd);
string[] Words = TxtWrd.Split(new char[] { ',' } , StringSplitOptions.None);
abbreviated.Add(Words[0]);
expanded.Add(Words[1]);
}
if (abbreviated.Contains("wuu2"))
{
//show message box
} else
{
//don't
}
}
As mentioned in one of the comments, a Dictionary might be better suited for this.
This assumes that the data in your file is in the following format, with a new set on each line.
wuu2,what are you up to
If all you want to do is check if a text file contains words in your list, you can read the entire contents of the file into a string (instead of line by line), split the string on your separators, and then check if the intersection of the words in the text file and your list of words has any items:
// Get the "separators" into a list
var wordsFile = #"c:\public\temp\textWords.csv"; // (#"C:\textwords.csv");
var separators = File.ReadAllText(wordsFile).Split(',');
// Get the words of the file into a list (add more delimeters as necessary)
var txtFile = #"c:\public\temp\temp.txt";
var allWords = File.ReadAllText(txtFile).Split(new[] {' ', '.', ',', ';', ':', '\r', '\n'});
// Get the intersection of the file words and the separator words
var commonWords = allWords.Intersect(separators).ToList().Distinct();
if (commonWords.Any())
{
Console.WriteLine("The text file contains the following matching words:");
Console.WriteLine(string.Join(", ", commonWords));
}
else
{
Console.WriteLine("The file did not contain any matching words.");
}
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();

c# string loop without specific number of loops

I have problem with looping code:
using System;
using System.Globalization;
namespace test
{
class Program
{
static void Main(string[] args)
{
string text = Console.ReadLine();
TextInfo ti = CultureInfo.CurrentCulture.TextInfo;
Console.WriteLine(ti.ToTitleCase(text).Replace(" ", string.Empty));
Console.ReadKey();
}
}
}
And maybe it's wrote like it shouldn't be because I can't find a way to fix it. To be specific I want this program to enter sentence in multiple lines of unknown number, it delete all white space and change every word first letter to upper case. So for example the enter data is:
I wanna ride bicycle,
but Rick say skateboard is better.
And output is:
IWannaRideBicycle,
ButRickSaySkateboardIsBetter.
The program can't have user interface so I think about while and making a list of strings but the problem for me will be still a way to loop it. I found a solution in C++ that they use "while ( getline (cin, text){}" but I think it's not useful in C#.
A while loop should do the trick. Console.ReadLine returns null if no more lines are available.
static void Main(string[] args)
{
List<string> converted = new List<string>();
while (true) // forever
{
string text = Console.ReadLine();
if (text == null)
{
break; // no more lines available - break out of loop.
}
// Convert to capitals
TextInfo ti = CultureInfo.CurrentCulture.TextInfo;
string convertedText = ti.ToTitleCase(text).Replace(" ", "");
converted.Add(convertedText);
}
// Now display the converted lines
foreach (string text in converted)
{
Console.WriteLine(text);
}
}
var text = Console.ReadLine();
while(!String.IsNullOrEmpty(text))
{
TextInfo ti = CultureInfo.CurrentCulture.TextInfo;
Console.WriteLine(ti.ToTitleCase(text).Replace(" ", string.Empty));
text = Console.ReadLine();
}
I think this may be a suitable solution. It will read input from the console until the user simply hits 'Enter' and sends you an empty string. I don't know that there is a more dynamic way to achieve what you're after.
It depends what your input is.
If its an IEnumerable<string> then simply use foreach:
var lines = ... //some IEnumerable<string>
foreach (var line in lines)
{
//do your thing
}
This will keep on looping as long as there is one more line to enumerate in lines. It can, in theory, keep on going forever.
If your input is a Stream then build a StreamReader around it and basically do the same (but with more plumbing):
using (var inputStream = ...// some stream)
using (var reader = new StreamReader(inputStream))
{
var line = reader.ReadLine();
while (line != null)
{
//do your thing
line = reader.ReadLine();
}
}
This again will loop as long as the input stream can produce a new line.

Match a string against an easy pattern

I am trying to future proof a program I am creating so that the pattern I need to have users put in is not hard coded. There is always a chance that the letter or number patter can change, but when it does I need everyone to remain consistent. Plus I want the managers to be to control what goes in without relying on me. Is it possible to use regex or another string tool to compare input against a list stored in a database. I want it to be easy so the patterns stored in the database would look like X###### or X######-X####### and so on.
Sure, just store the regular expression rules in a string column in a table and then load them into an IEnumerable<Regex> in your app. Then, a match is simply if ANY of those rules match. Beware that conflicting rules could be prone to greedy race (first one to be checked wins) so you'd have to be careful there. Also be aware that there are many optimizations that you could perform beyond my example, which is designed to be simple.
List<string> regexStrings = db.GetRegexStrings();
var result = new List<Regex>(regexStrings.Count);
foreach (var regexString in regexStrings)
{
result.Add(new Regex(regexString);
}
...
// The check
bool matched = result.Any(i => i.IsMatch(testInput));
You could store your patterns as-is in your database, and then translate them to regexes.
I don't know specifically what characters you'd need in your format, but let's suppose you just want to substitute a number to # and leave the rest as-is, here's some code for that:
public static Regex ConvertToRegex(string pattern)
{
var sb = new StringBuilder();
sb.Append("^");
foreach (var c in pattern)
{
switch (c)
{
case '#':
sb.Append(#"\d");
break;
default:
sb.Append(Regex.Escape(c.ToString()));
break;
}
}
sb.Append("$");
return new Regex(sb.ToString());
}
You can also use options like RegexOptions.IgnoreCase if that's what you need.
NB: For some reason, Regex.Escape escapes the # character, even though it's not special... So I just went for the character-by-character approach.
private bool TestMethod()
{
const string textPattern = "X###";
string text = textBox1.Text;
bool match = true;
if (text.Length == textPattern.Length)
{
char[] chrStr = text.ToCharArray();
char[] chrPattern = textPattern.ToCharArray();
int length = text.Length;
for (int i = 0; i < length; i++)
{
if (chrPattern[i] != '#')
{
if (chrPattern[i] != chrStr[i])
{
return false;
}
}
}
}
else
{
return false;
}
return match;
}
This is doing everything I need it to do now. Thanks for all the tips though. I will have to look into the regex more in the future.
Using MaskedTextProvider, you could do do something like this:
using System.Globalization;
using System.ComponentModel;
string pattern = "X&&&&&&-X&&&&&&&";
string text = "Xabcdef-Xasdfghi";
var culture = CultureInfo.GetCultureInfo("sv-SE");
var matcher = new MaskedTextProvider(pattern, culture);
int position;
MaskedTextResultHint hint;
if (!matcher.Set(text, out position, out hint))
{
Console.WriteLine("Error at {0}: {1}", position, hint);
}
else if (!matcher.MaskCompleted)
{
Console.WriteLine("Not enough characters");
}
else if (matcher.ToString() != text)
{
Console.WriteLine("Missing literals");
}
else
{
Console.WriteLine("OK");
}
For a description of the format, see: http://msdn.microsoft.com/en-us/library/system.windows.forms.maskedtextbox.mask

Categories