Parsing a string and finding specific text in C#

Parsing a string and finding specific text in C# - c#

I have a string which contains, lets say word "key" 5 times. Each time I see the word "key" in that string I want to print some text. How can I parse that string, find all "key" words and print the texts accordingly? 5 words "key" - 5 printed texts. This needs to be done in C#.
Thanks in advance.

How about using Regex.Matches:
string input = ...
string toPrint = ...
foreach (Match m in Regex.Matches(input, "key"))
Console.WriteLine(toPrint);
EDIT: If by "word", you mean 'whole words', you need a different regex, such as:
#"\bkey\b"

Inside a loop, you can use the substring() method which offers the starting position parameter, and with each iteration you would advance the starting position; loop exits when you reach the string-not-found condition. EDIT: as for printing the text, that would depend on where you want to print it. EDIT2: You also need to consider whether the target string can appear in a manner you would not consider a true "hit":
The key to success, the master key, is getting off your keyster...

I have an extension method I use for strings to get Indexes of a substring since .Net only provides IndexOf (single result for first substring match).
public static class Extensions
{
public static int[] IndexesOf(this string str, string sub)
{
int[] result = new int[0];
for(int i=0; i < str.Length; ++i)
{
if(i + sub.Length > str.Length)
break;
if(str.Substring(i,sub.Length).Equals(sub))
{
Array.Resize(ref result, result.Length + 1);
result[result.Length - 1] = i;
}
}
return result;
}
}
You could use the extension method for all instances of key to print something
int[] indexes = stringWithKeys.IndexesOf("key");
foreach(int index in indexes)
{
// print something
}
I know my code example may be longest but the extension method is reusable and you could place it in a "utility" type library for later use.

If it is a multiple word string, you could use LINQ.
string texttosearch;
string texttofind;
string[] source = texttosearch.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries);
var matchQuery = from word in source
where word.ToLowerInvariant() == texttofind.ToLowerInvariant()
select word;
foreach (string s in matchquery)
console.writeline(whatever you want to print);

Related

I am trying to read a txt file and read column one to get column two

I have a txt file that will have two different values, all numbers, where the first column is like 00 to 0000000 (2 to 12 in length) and where the second column will be 0120 to 0111111111 like in length (4 to 12 in length). My problems are multiple:
How to find the specific value (like a Boolean search)
How to return the corresponding value to it's own string
I have toyed with StreamReader unsuccessfully (not even able to make any of it work), and I have found such things as .Split, .Parse, and tried many examples on here and the net that were not actually doing what I needed.
/* Example of useless code I found */
class ReadFromFile
{
static void Main()
{
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
string text = "one\ttwo three:four,five six seven";
System.Console.WriteLine($"Original text: '{text}'");
string[] words = text.Split(delimiterChars);
System.Console.WriteLine($"{words.Length} words in text:");
foreach (var word in words)
{
System.Console.WriteLine($"{word}");
}
}
}
Well that code was pretty useless, it in no way achieved the mission as it just uses the .Split function to make new rows and not help with finding what was after a specific value.
So specifically I want to search for x value and get y value saved as z value string (using math terms for this portion except the word string).

Although you don't specify it, I presume your file will have multiple lines, each matching your description, like this:
00 0120
0000 0111111111
(etc.)
So you need to read each line, parse it using .Split and look for the value you want. If you only need to do it once, best would be to check each line immediately after you read it, using StreamReader as in the example:
using System;
using System.IO;
class Test
{
public static void Main()
{
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
using (StreamReader sr = new StreamReader(#"d:\temp\TestFile.txt"))
{
string line = sr.ReadLine();
string[] words = line.Split(delimiterChars);
if (words[0] == "00")
{
Console.WriteLine($"Found it, yay! words[0]={words[0]}, words[1]={words[1]}");
}
}
}
}
If you want to search more than once, instead of searching you could put the split words in some data structure - maybe a Dictionary - and search it later as many times as you want.

Like this?:
char[] delimiterChars = { ' ', ',', '.', ':', '\t' };
string text = "one\ttwo three:four,five six seven";
string[] words = text.Split(delimiterChars);
List<string> values = new List<string>();
for (int i = 0; i < words.Length; i+=2)
{
if(words.Length > i)
if (words[i].Length >= 2 && words[i].Length <= 12) {
if (words[i+1].Length >= 4 && words[i+1].Length <= 12)
{
values.Add(words[i+1]);
}
}
}
I'm really not sure if I understood your question at all :P
(Just comment if I'm way of, and I'll delete the answer again)

SOLUTION:
Please check my code at https://dotnetfiddle.net/txr4Qz and see if it helps you.
For your problem 1: How to find the specific value (like a Boolean search)
You have to check if the word is of a specific type. for eg:
// Checks if word is of number (change it as per your requirement)
if (int.TryParse (words[i], out int _) && int.TryParse (words[i+1], out int _)) { ...Statements... }
For your problem 2: How to return the corresponding value to it's own string
You are already getting a string form of data from the file. So store it and process it as required.

Check string for invalid characters? Smartest way?

I would like to check some string for invalid characters. With invalid characters I mean characters that should not be there. What characters are these? This is different, but I think thats not that importan, important is how should I do that and what is the easiest and best way (performance) to do that?
Let say I just want strings that contains 'A-Z', 'empty', '.', '$', '0-9'
So if i have a string like "HELLO STaCKOVERFLOW" => invalid, because of the 'a'.
Ok now how to do that? I could make a List<char> and put every char in it that is not allowed and check the string with this list. Maybe not a good idea, because there a lot of chars then. But I could make a list that contains all of the allowed chars right? And then? For every char in the string I have to compare the List<char>? Any smart code for this? And another question: if I would add A-Z to the List<char> I have to add 25 chars manually, but these chars are as I know 65-90 in the ASCII Table, can I add them easier? Any suggestions? Thank you

You can use a regular expression for this:
Regex r = new Regex("[^A-Z0-9.$ ]$");
if (r.IsMatch(SomeString)) {
// validation failed
}
To create a list of characters from A-Z or 0-9 you would use a simple loop:
for (char c = 'A'; c <= 'Z'; c++) {
// c or c.ToString() depending on what you need
}
But you don't need that with the Regex - pretty much every regex engine understands the range syntax (A-Z).

I have only just written such a function, and an extended version to restrict the first and last characters when needed. The original function merely checks whether or not the string consists of valid characters only, the extended function adds two integers for the numbers of valid characters at the beginning of the list to be skipped when checking the first and last characters, in practice it simply calls the original function 3 times, in the example below it ensures that the string begins with a letter and doesn't end with an underscore.
StrChr(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
StrChrEx(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", 11, 1));
BOOL __cdecl StrChr(CHAR* str, CHAR* chars)
{
for (int s = 0; str[s] != 0; s++)
{
int c = 0;
while (true)
{
if (chars[c] == 0)
{
return false;
}
else if (str[s] == chars[c])
{
break;
}
else
{
c++;
}
}
}
return true;
}
BOOL __cdecl StrChrEx(CHAR* str, CHAR* chars, UINT excl_first, UINT excl_last)
{
char first[2] = {str[0], 0};
char last[2] = {str[strlen(str) - 1], 0};
if (!StrChr(str, chars))
{
return false;
}
if (excl_first != 0)
{
if (!StrChr(first, chars + excl_first))
{
return false;
}
}
if (excl_last != 0)
{
if (!StrChr(last, chars + excl_last))
{
return false;
}
}
return true;
}

If you are using c#, you do this easily using List and contains. You can do this with single characters (in a string) or a multicharacter string just the same
var pn = "The String To ChecK";
var badStrings = new List<string>()
{
" ","\t","\n","\r"
};
foreach(var badString in badStrings)
{
if(pn.Contains(badString))
{
//Do something
}
}

If you're not super good with regular expressions, then there is another way to go about this in C#. Here is a block of code I wrote to test a string variable named notifName:
var alphabet = "a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z";
var numbers = "0,1,2,3,4,5,6,7,8,9";
var specialChars = " ,(,),_,[,],!,*,-,.,+,-";
var validChars = (alphabet + "," + alphabet.ToUpper() + "," + numbers + "," + specialChars).Split(',');
for (int i = 0; i < notifName.Length; i++)
{
if (Array.IndexOf(validChars, notifName[i].ToString()) < 0) {
errorFound = $"Invalid character '{notifName[i]}' found in notification name.";
break;
}
}
You can change the characters added to the array as needed. The Array IndexOf method is the key to the whole thing. Of course if you want commas to be valid, then you would need to choose a different split character.

Not enough reps to comment directly, but I recommend the Regex approach. One small caveat: you probably need to anchor both ends of the input string, and you will want at least one character to match. So (with thanks to ThiefMaster), here's my regex to validate user input for a simple arithmetical calculator (plus, minus, multiply, divide):
Regex r = new Regex(#"^[0-9\.\-\+\*\/ ]+$");

I'd go with a regex, but still need to add my 2 cents here, because all the proposed non-regex solutions are O(MN) in the worst case (string is valid) which I find repulsive for religious reasons.
Even more so when LINQ offers a simpler and more efficient solution than nesting loops:
var isInvalid = "The String To Test".Intersect("ALL_INVALID_CHARS").Any();

How to extract string at a certain character that is repeated within string?

How can I get "MyLibrary.Resources.Images.Properties" and "Condo.gif" from a "MyLibrary.Resources.Images.Properties.Condo.gif" string.
I also need it to be able to handle something like "MyLibrary.Resources.Images.Properties.legend.House.gif" and return "House.gif" and "MyLibrary.Resources.Images.Properties.legend".
IndexOf LastIndexOf wouldn't work because I need the second to last '.' character.
Thanks in advance!
UPDATE
Thanks for the answers so far but I really need it to be able to handle different namespaces. So really what I'm asking is how to I split on the second to last character in a string?

You can use LINQ to do something like this:
string target = "MyLibrary.Resources.Images.Properties.legend.House.gif";
var elements = target.Split('.');
const int NumberOfFileNameElements = 2;
string fileName = string.Join(
".",
elements.Skip(elements.Length - NumberOfFileNameElements));
string path = string.Join(
".",
elements.Take(elements.Length - NumberOfFileNameElements));
This assumes that the file name part only contains a single . character, so to get it you skip the number of remaining elements.

You can either use a Regex or String.Split with '.' as the separator and return the second-to-last + '.' + last pieces.

You can look for IndexOf("MyLibrary.Resources.Images.Properties."), add that to MyLibrary.Resources.Images.Properties.".Length and then .Substring(..) from that position

If you know exactly what you're looking for, and it's trailing, you could use string.endswith. Something like
if("MyLibrary.Resources.Images.Properties.Condo.gif".EndsWith("Condo.gif"))
If that's not the case check out regular expressions. Then you could do something like
if(Regex.IsMatch("Condo.gif"))
Or a more generic way: split the string on '.' then grab the last two items in the array.

string input = "MyLibrary.Resources.Images.Properties.legend.House.gif";
//if string isn't already validated, make sure there are at least two
//periods here or you'll error out later on.
int index = input.LastIndexOf('.', input.LastIndexOf('.') - 1);
string first = input.Substring(0, index);
string second = input.Substring(index + 1);

Try splitting the string into an array, by separating it by each '.' character.
You will then have something like:
{"MyLibrary", "Resources", "Images", "Properties", "legend", "House", "gif"}
You can then take the last two elements.

Just break down and do it in a char loop:
int NthLastIndexOf(string str, char ch, int n)
{
if (n <= 0) throw new ArgumentException();
for (int idx = str.Length - 1; idx >= 0; --idx)
if (str[idx] == ch && --n == 0)
return idx;
return -1;
}
This is less expensive than trying to coax it using string splitting methods and isn't a whole lot of code.
string s = "1.2.3.4.5";
int idx = NthLastIndexOf(s, '.', 3);
string a = s.Substring(0, idx); // "1.2"
string b = s.Substring(idx + 1); // "3.4.5"

Parse Text Row with Empty Spaces

I have a file, the text format is like this:
.640 .070 -.390 -.740 -1.030 -1.410 -1.780 -1.840
-1.360 -.360 .860 1.880 2.340 2.250 1.950 1.710
1.410 .700 -.300 -.840 -.280 1.020 1.860 1.460
.310 -.460 -.320 .350 1.020 1.650 2.430 3.070
2.840 1.440 -.460 -1.650 -1.520 -.520 .250 .190
-.420 -.870 -.800 -.280 .570 1.660 2.500 2.220
.520 -1.560 -2.530 -2.030 -1.200 -1.060 -1.230 -.600
.990 2.300 2.180 .940 -.090 -.140 .320 .470
.330 .420 .830 1.080 1.090 1.530 2.740 3.800
3.410 1.610 -.150 -.900 -1.120 -1.640 -2.140 -1.590
.210 2.210 3.290 3.170 2.380 1.880 2.530 4.210
5.280 3.820 -.040 -3.670 -4.190 -1.260 2.930 5.740
5.980 3.920 .540 -2.890 -5.010 -4.780 -2.150 1.640
4.670 5.540 4.230 1.950 .120 -.470 -.010 .340
-.710 -2.940 -4.070 -1.810 3.000 6.590 6.140 2.750
-.490 -2.460 -4.180 -5.660 -4.800 -.560 4.510 6.630
5.140 2.860 2.230 2.510 1.670 -.440 -2.030 -2.330
Note that there are a lot of white characters between one value and another.
I tried to read each line, and then split the line according to a ' ' character. My code is something like this:
public List<double> Parse(StreamReader sr)
{
var dataList = new List<double>();
while (sr.Peek() >= 0)
{
string line = sr.ReadLine();
if (lineCount > 1)
{
string[] columns = line.Split(' ');
for (var j = 0; j < columns.Length; j++)
{
dataList.Add(double.Parse(columns[j]) ));
}
}
}
return dataList ;
}
The problem with the above code is that it is only able to handle the case where values are separated by a single white character.
Any idea ?

The simplest way is probably to use an overload of String.Split which includes a StringSplitOptions parameter, and specify StringSplitOptions.RemoveEmptyEntries.
I would also personally just call ReadLine until that returned null, rather than using TextReader.Peek. Aside from anything else, it's more general - it will work even if the underlying stream (if any) doesn't support seeking.

Before you do the split, replace all multi spaces with a single space, something like:
line = System.Text.RegularExpressions.Regex.Replace(line, #" +", #" ");

You may use the simple one line code for this. Let your text is in the string named input.
string[] values = System.Text.RegularExpressions.Regex.Split(input, #"\s+");
You will get all values in a string array simply

How to split a string while preserving line endings?

I have a block of text and I want to get its lines without losing the \r and \n at the end. Right now, I have the following (suboptimal code):
string[] lines = tbIn.Text.Split('\n')
.Select(t => t.Replace("\r", "\r\n")).ToArray();
So I'm wondering - is there a better way to do it?
Accepted answer
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");

The following seems to do the job:
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
(?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.
(?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.

Something along the lines of using this regular expression:
[^\n\r]*\r\n
Then use Regex.Matches().
The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)

Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:
string[] lines = tbIn.Text.Split('\n');
for (int i = 0; i < lines.Length; ++i)
{
lines[i] = lines[i].Replace("\r", "\r\n");
}
... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.
One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?

If you are just going to replace the newline (\n) then do something like this:
string[] lines = tbIn.Text.Split('\n')
.Select(t => t + "\r\n").ToArray();
Edit: Regex.Replace allows you to split on a string.
string[] lines = Regex.Split(tbIn.Text, "\r\n")
.Select(t => t + "\r\n").ToArray();

As always, extension method goodies :)
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
usage:
string text = "One,Two,Three,Four";
foreach (var s in text.SplitAndKeep(","))
{
Console.WriteLine(s);
}
Output:
One,
Two,
Three,
Four

You can achieve this with a regular expression. Here's an extension method with it:
public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
{
MatchCollection matches = Regex.Matches(input, #"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
string[] result = new string[matches.Count];
for (int i = 0; i < matches.Count ; i++)
{
result[i] = matches[i].Value;
}
return result;
}
I'm not sure if this is a better solution. Yours is very compact and simple.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing a string and finding specific text in C# - c#

I have a string which contains, lets say word "key" 5 times. Each time I see the word "key" in that string I want to print some text. How can I parse that string, find all "key" words and print the texts accordingly? 5 words "key" - 5 printed texts. This needs to be done in C#. Thanks in advance.

How about using Regex.Matches: string input = ... string toPrint = ... foreach (Match m in Regex.Matches(input, "key")) Console.WriteLine(toPrint); EDIT: If by "word", you mean 'whole words', you need a different regex, such as: #"\bkey\b"

Related

I am trying to read a txt file and read column one to get column two

Check string for invalid characters? Smartest way?

How to extract string at a certain character that is repeated within string?

Parse Text Row with Empty Spaces

How to split a string while preserving line endings?

Categories

Resources