finding a variable from string input and extracting it with regex c#

finding a variable from string input and extracting it with regex c# - c#

~I have a client which is sending the a message to my server and I am trying to get substrings in order to extract them into variables. I want to use regex for this. Although I have no syntax problems, it will not match. This is the message I am sending and my code.
" PUT /John\r\n\r\n
London "
private StreamReader sReader = null;
private StreamWriter sWriter = null;
public SocketClass(Socket s)
{
socket = s;
NetworkStream nStream = new NetworkStream(s);
sReader = new StreamReader(nStream);
sWriter = new StreamWriter(nStream);
startSocket();
}
String txt = "";
while (sReader.Peek() >= 0)
{
txt += sReader.ReadLine() + "\r\n";
}
else if (txt.Contains("PUT"))
{
Console.WriteLine("triggered");
Regex pattern = new Regex(#"PUT /(?<Name>\d+)\r\n\r\n(?<Location>\d+)\r\n");
Match match = pattern.Match(txt);
if (match.Success)
{
String Name = match.Groups["Name"].Value;
String Location = match.Groups["Location"].Value;
Console.WriteLine(Name);
Console.WriteLine(Location);
}
}

The problem seems to be that while your input has alphanumeric characters your regex is looking for \d which are numeric digits. The regex can be easily changed to this to make it work:
Regex pattern = new Regex(#"PUT /(?<Name>.+)\r\n\r\n(?<Location>.+)\r\n");
. represents any character. It may be that you could narrow it down more to say the match has to be alphabetic characters or something else but the above will certainly work for your given input.

Related

Fetch particular string from particular block

I am new to c#. I have text file with data in it but I want to read particular block line data.
Here address can occur multiple times in text file.
Something here...
... ... ...
interface "system"
address 10.4.1.10/32
no shutdown
exit
something here...
... ... ...
address 101.4.1.11/32
but i want to capture within this
interface "system"
address 10.4.1.10/32
no shutdown
exit
I want to capture this ip from the block:
10.4.1.10
I tried this code:
int counter = 0;
string line;
// Read the file and display it line by line.
System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null)
{
Console.WriteLine (line);
counter++;
}
file.Close();
// Suspend the screen.
Console.ReadLine();
Expected Output:
my expected output is to capture the ip address from that block ie.10.4.1.10
that ip is inside "interface system" block.. that makes that address as unique.. as there can be many ips with keyword address. So i want to take address which is inside interface system block.
Please let me know how i can capture particular string from the block.

Regular Expressions are perfectly suited to handle this type of "problem". The following console app demonstrates how to use Regex to extract the desired IP address from the targeted string block.
private static readonly string IPV4_PATTERN = "[0-9./]";
private static readonly string IPV4_IPV6_PATTERN = "[A-Z0-9:./]";
static void Main(string[] args)
{
TestSearchFile();
}
private static string ParseIpWithRegex(string textToSearch, string startBlock, string endBlock)
{
var pattern = $#"{startBlock}\D*\s*({IPV4_IPV6_PATTERN}+).*{endBlock}";
var ms = Regex.Match(textToSearch, pattern, RegexOptions.Singleline | RegexOptions.IgnoreCase);
if (ms.Groups.TryGetValue("1", out var g))
{
return g.Value;
}
return string.Empty;
}
private static void TestSearchFile()
{
var sep = Environment.NewLine;
var ipAddress6 = "2001:db8:85a3:8d3:1319:8a2e:370:7348";
var ipAddress4 = "10.4.1.10/32";
var t = "Something here..." + sep;
t += "... ... ... " + sep;
t += "interface \"system\"" + sep;
t += "address " + ipAddress4 + sep;
t += "no shutdown" + sep;
t += "exit" + sep;
t += "something here..." + sep;
t += "address 101.4.1.11/32" + sep;
t += "... ... ... " + sep;
var startBlock = "interface \"system\"";
var endBlock = "exit";
var ip = ParseIpWithRegex(t, startBlock, endBlock);
Console.WriteLine($"IP: {ip}");
}
I've included two IP address patterns IPV4_PATTERN for IPV4 only as well as IPV4_IPV6_PATTERN for both IPV4 and IPV6. Select the one you feel is most appropriate. Although the IPV4_IPV6_PATTERN would apply to both IP versions I believe it improves performance slight when the search is narrowed by using the narrowest pattern.
Don't forget to import the Regex reference:
using System.Text.RegularExpressions;
**Code Explained**
The method "ParseIpWithRegex" uses a Regex pattern constructed by using the string that signifies the start of the targeted block and the string that signifies the end of that block. Nestled within that pattern is the regular expressions class definition that defines the IP address pattern we wish to isolate as a group.
$#"{startBlock}\D*\s*({IPV4_IPV6_PATTERN}+).*{endBlock}";
It should be noted that the curly brackets are just for string interpolation and have (in this case) nothing to do with the actual regular expression!
After the "startBlock" we see "\D*". This means that after the "startBlock" include in the search all non-numeric characters (where the "star" indicates to expect zero to infinitely many). Then we see "\s*" which means to include all white space (including new line characters since I included RegexOptions.Singleline).
The IP address pattern is in brackets "()" which instructs Regex to create groups. In this case, behind the IP address pattern (in the above code example IPV4_IPV6_PATTERN) there is a "+" symbol. This indicates that there MUST be at least one of the characters that is in the IP address Regex class definition in order to be considered a "match".
After that we see ".*" in front of the "endBlock". This means to look for any character--including the "new line" character (zero to infinitely many) in from of the "endBlock" string.
If you have any questions, please leave a comment.
EDIT
From your button onclick method you will call SearchFileForIp. You will need to change myTextBox to match your code.
You should also decide whether you will be searching IPV4 or both IPV4 and IPV6 and select the appropriate variable IPV4_PATTERN or IPV4_IPV6_PATTERN.
private void SearchFileForIp()
{
var fileName = "c:\\test.txt";
using var sr = new StreamReader(fileName);
string fileContent = sr.ReadToEnd();
var startBlock = "interface \"system\"";
var endBlock = "exit";
var ip = ParseForIpRegex(fileContent, startBlock, endBlock);
myTextBox.Text = ip; //Change this to match your code
}
private readonly string IPV4_PATTERN = "[0-9./]";
private readonly string IPV4_IPV6_PATTERN = "[A-Z0-9:./]";
private string ParseForIpRegex(string textToSearch, string startBlock, string endBlock)
{
var pattern = $#"{startBlock}\D*\s*({IPV4_PATTERN}+).*{endBlock}";
var ms = Regex.Match(textToSearch, pattern, RegexOptions.Singleline | RegexOptions.IgnoreCase);
if(ms.Groups.Count > 0)
{
return ms.Groups[1].Value;
}
//For .Net Core apps
//if (ms.Groups.TryGetValue("1", out var g))
//{
// return g.Value;
//}
return string.Empty;
}

In addition to the 2 answers with Regex solutions, If address line comes always after interace "system", than a simple for loop can do the job.
interface "system"
address 10.4.1.10/32
no shutdown
exit
So We go thorugh file lines and check if line is interace "system" than take the next value and parse it to string of ip address.
public static string GetIpAddressFromFile(string fileName, string startLine)
{
var lines = File.ReadAllLines(fileName);
var ipAddress = string.Empty;
for (var i = 0; i < lines.Length; i++)
{
var line = lines[i].Trim();
if (line != startLine) continue;
var addressLine = lines[i + 1].Trim().Replace("address", "");
ipAddress = addressLine.Substring(0, addressLine.IndexOf("/", StringComparison.Ordinal));
break;
}
return ipAddress.Trim();
}
Lets assume you that your file is inconsistent and address does not comes first after interface "system"
interface "system"
...
address 10.4.1.10/32
no shutdown
exit
So in this case we put all lines between interface "system" and exit in list of strings, Or dictionary and fetch the address key.
public static string GetIpAddressFromFile(string fileName, string startLine, string endLine)
{
var lines = File.ReadAllLines(fileName);
var ipAddress = string.Empty;
var state = false;
var results = new Dictionary<string, string>();
foreach (var t in lines)
{
var line = t.Trim();
if (line == startLine)
state = true;
if (line == endLine)
state = false;
if (!state) continue;
var s = line.Split(" ");
results.TryAdd(s[0], s[1]);
}
var result = results.GetValueOrDefault("address");
if (result != null)
{
ipAddress = result.Substring(0, result.IndexOf("/", StringComparison.Ordinal));
}
return ipAddress;
}
Usage:
var startLine = "interface \"system\"";
var endLine = "exit";
var ip = GetIpAddressFromFile(#"File.txt", startLine);
//Or
var ip = GetIpAddressFromFile1(#"File.txt", startLine, endLine);
Both methods are tested with your given example and return:
10.4.1.10

If the start of the block and the end of the block are well defined, in order to find the block you can simply:
Search for the start of the block
Do something with the lines until the end of the block
string line;
System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null && !line.Equals(START_OF_BLOCK)); // 1.
while((line = file.ReadLine()) != null && !line.Equals(END_OF_BLOCK)) // 2.
{
// do something with the lines
}
file.Close();
Updated answer after edited question:
In order to "extract" the string in a form of an IP address inside the block, you could, for example, use Regular expressions with a .NET Regex class, with previously finding the needed block:
Search for the start of the block
Search for the line inside the block which contains "address"
Extract the IP address from the line using Regexp.Match()
string line;
System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");
string pat = #"\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b";
System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex(pat);
while ((line = Console.ReadLine()) != null && !line.Equals(START_OF_BLOCK)); // 1.
while ((line = Console.ReadLine()) != null && !line.Equals(END_OF_BLOCK)) // 2.
{
if (line.Contains("address"))
{
System.Text.RegularExpressions.Match ip = reg.Match(line);
Console.WriteLine(ip);
break; // break if you are sure there's only one ip in that block
}
}
file.Close();

Here is simple LINQ for that:
var textData = File.ReadAllLines("Path goes here");
var address = string.Join("", textData
.SkipWhile(x => !x.Trim().StartsWith($"interface \"system\""))
.SkipWhile(x => !x.Trim().StartsWith($"address"))
.Take(1)).Split("address")[1].Trim();
SkipWhile goes trough string array until it finds line which starts
like: "interface \"system\"".
Second SkipWhile goes trough part after "interface \"system\"" string until
it finds line which starts like: "address".
Then you Take(1) matching line and create string out of it.
Then you use Split to create new array which contains address text
and ip address.
After that you simply take last part of the array.

replace links in string with my link

i want to replace every link(s) in a string with the link i want to provide. What i have tried is-
StreamReader reader = new StreamReader(dd1.SelectedItem.Value);
string readFile = reader.ReadToEnd();
Regex regx = new Regex("http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*([a-zA-Z0-9\\?\\#\\=\\/]){1})?", RegexOptions.IgnoreCase);
string output=regx.ToString();
output = readFile;
MatchCollection matches = regx.Matches(output);
foreach (Match match in matches)
{
output = output.Replace(#"match.Value", #"http://localhost:61187/two?" + "sender=" + Server.UrlEncode(this.txtUsername.Text) + "&reciever=" + output);
}
Here, i have a string output which contains some links. So, i have used regex to parse the links in the string. But, the string named "output" is not read and its neither showing an error nor an output.

It seems to me that you should be using regx.Replace(...) instead:
StreamReader reader = new StreamReader(dd1.SelectedItem.Value);
string readFile = reader.ReadToEnd();
Regex regx = new Regex("http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*([a-zA-Z0-9\\?\\#\\=\\/]){1})?", RegexOptions.IgnoreCase);
string output = regx.ToString();
output = readFile;
string username = Server.UrlEncode(this.txtUsername.Text);
output = regx.Replace(output, new MatchEvaluator((match) =>
{
var url = Uri.EscapeDataString(match.Value);
return $"http://localhost:61187/two?sender={username}&receiver={url}";
}));
This will replace every match with the URL returned by the anonymous function.

The best way to split a string without a separator

I have string:
MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938
From the string above you can see that it is separated by colon (:), but in the actual environment, it does not have a standard layout.
The standard is the fields name, example MONEY-ID, and MONEY-STAT.
How I can I split it the right way? And get the value from after the fields name?

Something like that should work:
string s = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
Regex regex = new Regex(#"MONEY-ID(?<moneyId>.*?)\:MONEY-STAT(?<moneyStat>.*?)\:MONEY-PAYetr-(?<moneyPaetr>.*?)$"); Match match = regex.Match(s);
if (match.Success)
{
Console.WriteLine("Money ID: " + match.Groups["moneyId"].Value);
Console.WriteLine("Money Stat: " + match.Groups["moneyStat"].Value);
Console.WriteLine("Money Paetr: " + match.Groups["moneyPaetr"].Value);
}
Console.WriteLine("hit <enter>");
Console.ReadLine();
UPDATE
Answering additional question, if we're not sure in format, then something like the following could be used:
string s = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
var itemsToExtract = new List<string> { "MONEY-STAT", "MONEY-PAYetr-", "MONEY-ID", };
string regexFormat = #"{0}(?<{1}>[\d]*?)[^\w]";//sample - MONEY-ID(?<moneyId>.*?)\:
foreach (var item in itemsToExtract)
{
string input = s + ":";// quick barbarian fix of lack of my knowledge of regex. Sorry
var match = Regex.Match(input, string.Format(regexFormat, item, "match"));
if (match.Success)
{
Console.WriteLine("Value of {0} is:{1}", item, match.Groups["match"]);
}
}
Console.WriteLine("hit <enter>");
Console.ReadLine();

As Andre said, I would personally go with regular expressions.
Use groups of something like,
"MONEY-ID(?<moneyid>.*)MONEY-STAT(?<moneystat>.*)MONEY-PAYetr(?<moneypay>.*)"
See this post for how to extract the groups.
Probably followed by a private method that trims off illegal characters in the matched group (e.g. : or -).

Check this out:
string regex = #"^(?i:money-id)(?<moneyid>.*)(?i:money-stat)(?<moneystat>.*)(?i:money-pay)(?<moneypay>.*)$";
string input = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
Match regexMatch = Regex.Match(input, regex);
string moneyID = regexMatch.Groups["moneyid"].Captures[0].Value.Trim();
string moneyStat = regexMatch.Groups["moneystat"].Captures[0].Value.Trim();
string moneyPay = regexMatch.Groups["moneypay"].Captures[0].Value.Trim();

Try
string data = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
data = data.Replace("MONEY-", ";");
string[] myArray = data.Split(';');
foreach (string s in myArray)
{
if (!string.IsNullOrEmpty(s))
{
if (s.StartsWith("ID"))
{
}
else if (s.StartsWith("STAT"))
{
}
else if (s.StartsWith("PAYetr"))
{
}
}
}
results in
ID123456:
STAT43:
PAYetr-1232832938

For example, using regular expressions,
(?<=MONEY-ID)(\d)*
It will extract
123456
from your string.

Saving an XML that has invalid characters

there are code snippets that strip the invalid characters inside a string before we save it as an XML ... but I have one more problem: Let's say my user wants to have a column name like "[MyColumnOne] ...so now I do not want to strip these "[","] well because these are the ones that user has defined and wants to see them so if I use some codes that are stripping the invalid characters they are also removing "[" and "[" but in this case I still need them to be saved... what can I do?

Never mind, I changed my RegEx format to use XML 1.1 instead of XML 1.0 and now it is working good :
string pattern = String.Empty;
//pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])"; //XML 1.0
pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|[19][0-9A-F]|7F|8[0-46-9A-F]|0?[1-8BCEF])"; // XML 1.1
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(sString))
{
sString = regex.Replace(sString, String.Empty);
File.WriteAllText(sString, sString, Encoding.UTF8);
}
return sString;

This worked for me, and it was fast.
private object NormalizeString(object p) {
object result = p;
if (p is string || p is long) {
string s = string.Format("{0}", p);
string resultString = s.Trim();
if (string.IsNullOrWhiteSpace(resultString)) return "";
Regex rxInvalidChars = new Regex("[\r\n\t]+", RegexOptions.IgnoreCase);
if (rxInvalidChars.IsMatch(resultString)) {
resultString = rxInvalidChars.Replace(resultString, " ");
}
//string pattern = String.Empty;
//pattern = #"";
////pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])"; //XML 1.0
////pattern = #"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|[19][0-9A-F]|7F|8[0-46-9A-F]|0?[1-8BCEF])"; // XML 1.1
//Regex rxInvalidXMLChars = new Regex(pattern, RegexOptions.IgnoreCase);
//if (rxInvalidXMLChars.IsMatch(resultString)) {
// resultString = rxInvalidXMLChars.Replace(resultString, "");
//}
result = string.Join("", resultString.Where(c => c >= ' '));
}
return result;
}

RegEx -- getting rid of double whitespaces?

I have an app that goes in, replaces "invalid" chars (as defined by my Regex) with a blankspace. I want it so that if there are 2 or more blank spaces in the filename, to trim one. For example:
Deal A & B.txt after my app runs, would be renamed to Deal A   B.txt (3 spaces b/w A and B). What i want is really this: Deal A B.txt (one space between A and B).
I'm trying to determine how to do this--i suppose my app will have to run through all filenames at least once to replace invalid chars and then run through filenames again to get rid of extraneous whitespace.
Can anybody help me with this?
Here is my code currently for replacing the invalid chars:
public partial class CleanNames : Form
{
public CleanNames()
{
InitializeComponent();
}
public void Sanitizer(List<string> paths)
{
string regPattern = (#"[~#&$!%+{}]+");
string replacement = " ";
Regex regExPattern = new Regex(regPattern);
StreamWriter errors = new StreamWriter(#"S:\Testing\Errors.txt", true);
var filesCount = new Dictionary<string, int>();
dataGridView1.Rows.Clear();
try
{
foreach (string files2 in paths)
{
string filenameOnly = System.IO.Path.GetFileName(files2);
string pathOnly = System.IO.Path.GetDirectoryName(files2);
string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
string sanitized = System.IO.Path.Combine(pathOnly, sanitizedFileName);
if (!System.IO.File.Exists(sanitized))
{
DataGridViewRow clean = new DataGridViewRow();
clean.CreateCells(dataGridView1);
clean.Cells[0].Value = pathOnly;
clean.Cells[1].Value = filenameOnly;
clean.Cells[2].Value = sanitizedFileName;
dataGridView1.Rows.Add(clean);
System.IO.File.Move(files2, sanitized);
}
else
{
if (filesCount.ContainsKey(sanitized))
{
filesCount[sanitized]++;
}
else
{
filesCount.Add(sanitized, 1);
}
string newFileName = String.Format("{0}{1}{2}",
System.IO.Path.GetFileNameWithoutExtension(sanitized),
filesCount[sanitized].ToString(),
System.IO.Path.GetExtension(sanitized));
string newFilePath = System.IO.Path.Combine(System.IO.Path.GetDirectoryName(sanitized), newFileName);
System.IO.File.Move(files2, newFilePath);
sanitized = newFileName;
DataGridViewRow clean = new DataGridViewRow();
clean.CreateCells(dataGridView1);
clean.Cells[0].Value = pathOnly;
clean.Cells[1].Value = filenameOnly;
clean.Cells[2].Value = newFileName;
dataGridView1.Rows.Add(clean);
}
}
}
catch (Exception e)
{
errors.Write(e);
}
}
private void SanitizeFileNames_Load(object sender, EventArgs e)
{ }
private void dataGridView1_CellContentClick(object sender, DataGridViewCellEventArgs e)
{
}
private void button1_Click(object sender, EventArgs e)
{
Application.Exit();
}
}
The problem is, that not all files after a rename will have the same amount of blankspaces. As in, i could have Deal A&B.txt which after a rename would become Deal A B.txt (1 space b/w A and B--this is fine). But i will also have files that are like: Deal A & B & C.txt which after a rename is: Deal A   B   C.txt (3 spaces between A,B and C--not acceptable).
Does anybody have any ideas/code for how to accomplish this?

Do the local equivalent of:
s/\s+/ /g;

Just add a space to your regPattern. Any collection of invalid characters and spaces will be replaced with a single space. You may waste a little bit of time replacing a space with a space, but on the other hand you won't need a second string manipulation call.

Does this help?
var regex = new System.Text.RegularExpressions.Regex("\\s{2,}");
var result = regex.Replace("Some text with a lot of spaces, and 2\t\ttabs.", " ");
Console.WriteLine(result);
output is:
Some text with a lot of spaces, and 2 tabs.
It just replaces any sequence of 2 or more whitespace characters with a single space...
Edit:
To clarify, I would just perform this regex right after your existing one:
public void Sanitizer(List<string> paths)
{
string regPattern = (#"[~#&$!%+{}]+");
string replacement = " ";
Regex regExPattern = new Regex(regPattern);
Regex regExPattern2 = new Regex(#"\s{2,}");
and:
foreach (string files2 in paths)
{
string filenameOnly = System.IO.Path.GetFileName(files2);
string pathOnly = System.IO.Path.GetDirectoryName(files2);
string sanitizedFileName = regExPattern.Replace(filenameOnly, replacement);
sanitizedFileName = regExPattern2.Replace(sanitizedFileName, replacement); // clean up whitespace
string sanitized = System.IO.Path.Combine(pathOnly, sanitizedFileName);
I hope that makes more sense.

you can perform another regex replace after your first one
#" +" -> " "

As Fosco said, with formatting:
while (mystring.Contains(" ")) mystring = mystring.Replace(" "," ");
// || || |

After you're done sanitizing it your way, simply replace 2 spaces with 1 space, while 2 spaces exist in the string.
while (mystring.Contains(" ")) mystring = mystring.Replace(" "," ");
I think that's the right syntax...

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

finding a variable from string input and extracting it with regex c# - c#

Related

Fetch particular string from particular block

replace links in string with my link

The best way to split a string without a separator

Saving an XML that has invalid characters

RegEx -- getting rid of double whitespaces?

Categories

Resources