I'm using Regex to match characters from a file, but I want to match 2 different strings from that file but they appear more than once, that's why I am using a loop. I can match with a single string but not with 2 strings.
Regex celcius = new Regex(#"""temp"":\d*\.?\d{1,3}");
foreach (Match match in celcius.Matches(htmlcode))
{
Regex date = new Regex(#"\d{4}-\d{2}-\d{2}");
foreach (Match match1 in date.Matches(htmlcode))
{
string date1 = Convert.ToString(match1.Value);
string temperature = Convert.ToString(match.Value);
Console.Write(temperature + "\t" + date1);
}
}
htmlcode:
{"temp":287.05,"temp_min":286.932,"temp_max":287.05,"pressure":1019.04,"sea_level":1019.04,"grnd_level":1001.11,"humidity":89,"temp_kf":0.12},"weather":[{"id":804,"main":"Clouds","description":"overcast
clouds","icon":"04n"}],"clouds":{"all":100},"wind":{"speed":0.71,"deg":205.913},"sys":{"pod":"n"},"dt_txt":"2019-09-22
21:00:00"},{"dt":1569196800,"main":{"temp":286.22,"temp_min":286.14,"temp_max":286.22,"pressure":1019.27,"sea_level":1019.27,"grnd_level":1001.49,"humidity":90,"temp_kf":0.08},"weather":[{"id":804,"main":"Clouds","description":"overcast
clouds","icon":"04n"}],"clouds":{"all":99},"wind":{"speed":0.19,"deg":31.065},"sys":{"pod":"n"},"dt_txt":"2019-09-23
00:00:00"},{"dt":1569207600,"main":{"temp":286.04,"temp_min":286,"temp_max":286.04,"pressure":1019.38,"sea_level":1019.38,"grnd_level":1001.03,"humidity":89,"temp_kf":0.04},"weather":
You can use a single Regex pattern with two capturing groups for temperature and date. The pattern can look something like this:
("temp":\d*\.?\d{1,3}).*?(\d{4}-\d{2}-\d{2})
Regex demo.
C# example:
string htmlcode = // ...
var matches = Regex.Matches(htmlcode, #"(""temp"":\d*\.?\d{1,3}).*?(\d{4}-\d{2}-\d{2})");
foreach (Match m in matches)
{
Console.WriteLine(m.Groups[1].Value + "\t" + m.Groups[2].Value);
}
Output:
"temp":287.05 2019-09-22
"temp":286.22 2019-09-23
Try it online.
I don't think you have HTML. I think you have a collection of something called JSON (JavaScript Object Notification) which is a way to pass data efficiently.
So, this is one of your "HTML" objects.
{
"temp":287.05,
"temp_min":286.932,
"temp_max":287.05,
"pressure":1019.04,
"sea_level":1019.04,
"grnd_level":1001.11,
"humidity":89,
"temp_kf":0.12},
"weather":[{
"id":804,
"main":"Clouds",
"description":"overcast clouds",
"icon":"04n"
}],
"clouds":{
"all":100
},
"wind":{
"speed":0.71,"deg":205.913
},
"sys":{
"pod":"n"
},
"dt_txt":"2019-09-22 21:00:00"
}
So, I would recommend converting the line using the C# web helpers and parsing the objects directly.
//include this library
using System.Web.Helpers;
//parse your htmlcode using this loop
foreach(var line in htmlcode)
{
dynamic data = JSON.decode(line);
string temperature = (string)data["temp"];
string date = Convert.ToDateTime(data["dt_txt"]).ToString("yyyy-MM-dd");
Console.WriteLine($"temperature: {temperature} date: {date}"");
}
Related
I'm having issues doing a find / replace type of action in my function, i'm extracting the < a href="link">anchor from an article and replacing it with this format: [link anchor] the link and anchor will be dynamic so i can't hard code the values, what i have so far is:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
string theString = string.Empty;
switch (articleWikiCheck) {
case "id|wpTextbox1":
StringBuilder newHtml = new StringBuilder(articleBody);
Regex r = new Regex(#"\<a href=\""([^\""]+)\"">([^<]+)");
string final = string.Empty;
foreach (var match in r.Matches(theString).Cast<Match>().OrderByDescending(m => m.Index))
{
string text = match.Groups[2].Value;
string newHref = "[" + match.Groups[1].Index + " " + match.Groups[1].Index + "]";
newHtml.Remove(match.Groups[1].Index, match.Groups[1].Length);
newHtml.Insert(match.Groups[1].Index, newHref);
}
theString = newHtml.ToString();
break;
default:
theString = articleBody;
break;
}
Helpers.ReturnMessage(theString);
return theString;
}
Currently, it just returns the article as it originally is, with the traditional anchor text format: < a href="link">anchor
Can anyone see what i have done wrong?
regards
If your input is HTML, you should consider using a corresponding parser, HtmlAgilityPack being really helpful.
As for the current code, it looks too verbose. You may use a single Regex.Replace to perform the search and replace in one pass:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody, #"<a\s+href=""([^""]+)"">([^<]+)", "[$1 $2]");
}
else
{
// Helpers.ReturnMessage(articleBody); // Uncomment if it is necessary
return articleBody;
}
}
See the regex demo.
The <a\s+href="([^"]+)">([^<]+) regex matches <a, 1 or more whitespaces, href=", then captures into Group 1 any one or more chars other than ", then matches "> and then captures into Group 2 any one or more chars other than <.
The [$1 $2] replacement replaces the matched text with [, Group 1 contents, space, Group 2 contents and a ].
Updated (Corrected regex to support whitespaces and new lines)
You can try this expression
Regex r = new Regex(#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>");
It will match your anchors, even if they are splitted into multiple lines. The reason why it is so long is because it supports empty whitespaces between the tags and their values, and C# does not supports subroutines, so this part [\s\n]* has to be repeated multiple times.
You can see a working sample at dotnetfiddle
You can use it in your example like this.
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody,
#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>",
"[${link} ${anchor}]");
}
else
{
return articleBody;
}
}
I am using "nslookup" to get machine name from IP.
nslookup 1.2.3.4
Output is multiline and machine name's length dynamic chars. How can I extract "DynamicLengthString" from all output. All suggestions IndexOf and Split, but when I try to do like that, I was not a good solution for me. Any advice ?
Server: volvo.toyota.opel.tata
Address: 5.6.7.8
Name: DynamicLengthString.toyota.opel.tata
Address: 1.2.3.4
I made it the goold old c# way without regex.
string input = #"Server: volvo.toyota.opel.tata
Address: 5.6.7.8
Name: DynamicLengtdfdfhString.toyota.opel.tata
Address: 1.2.3.4";
string targetLineStart = "Name:";
string[] allLines = input.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
string targetLine = String.Empty;
foreach (string line in allLines)
if (line.StartsWith(targetLineStart))
{
targetLine = line;
}
System.Console.WriteLine(targetLine);
string dynamicLengthString = targetLine.Remove(0, targetLineStart.Length).Split('.')[0].Trim();
System.Console.WriteLine("<<" + dynamicLengthString + ">>");
System.Console.ReadKey();
This extracts "DynamicLengtdfdfhString" from the given input, no matter where the Name-Line is and no matter what comes afterwards.
This is the console version to test & verify it.
You can use Regex
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string Content = "Server: volvo.toyota.opel.tata \rAddress: 5.6.7.8 \rName: DynamicLengthString.toyota.opel.tata \rAddress: 1.2.3.4";
string Pattern = "(?<=DynamicLengthString)(?s)(.*$)";
//string Pattern = #"/^Dy*$/";
MatchCollection matchList = Regex.Matches(Content, Pattern);
Console.WriteLine("Running");
foreach(Match match in matchList)
{
Console.WriteLine(match.Value);
}
}
}
I'm going to assume your output is exactly like you put it.
string output = ExactlyAsInTheQuestion();
var fourthLine = output.Split(Environment.NewLine)[3];
var nameValue = fourthLine.Substring(9); //skips over "Name: "
var firstPartBeforePeriod = nameValue.Split('.')[0];
//firstPartBeforePeriod should equal "DynamicLengthString"
Note that this is a barebones example:
Either check all array indexes before you access them, or be prepared to catch IndexOutOfRangeExceptions.
I've assumed that the four spaces between "Name:" and "DynamicLengthString" are four spaces. If they are a tab character, you'll need to adjust the Substring(9) method to Substring(6).
If "DynamicLengthString" is supposed to also have periods in its value, then my answer does not apply. You'll need to use a regex in that case.
Note: I'm aware that you dismissed Split:
All suggestions IndexOf and Split, but when I try to do like that, I was not a good solution for me.
But based on only this description, it's impossible to know if the issue was in getting Split to work, or it actually being unusable for your situation.
I have a string which is somewhat like this:
string data = "I have a {apple} and a {orange}";
I need to extract the content inside {}, let's say for 10 times
I tried this
string[] split = data.Split(new char[] { '{', '}' }, StringSplitOptions.RemoveEmptyEntries);
The problem is my data is going to be dynamic and I wouldn't know at what instance the {<>} would be present, it can also be something like this
Give {Pen} {Pencil}
I guess the above method wouldn't work, so I would really like to know a dynamic way to do this. Any input would be really helpful.
Thanks and Regards
Try this:
string data = "I have a {apple} and a {orange}";
Regex rx = new Regex("{(.*?)}");
foreach (Match item in rx.Matches(data))
{
Console.WriteLine(item.Groups[1].Value);
}
You need to use Regex to get all values you need.
If the string between {} does not contain nested {} you can use a regex to perform this task:
string data = "I have a {apple} and a {orange}";
Regex reg = new Regex(#"\{(?<Name>[A-z0-9]*)\}");
var matches = reg.Matches(data);
foreach (var m in matches.OfType<Match>())
{
Console.WriteLine($"Found {m.Groups["Name"].Value} at {m.Index}");
}
To replace the strings between {} you can use Regex.Replace:
reg.Replace(data, m => m.Groups["Name"].Value + "_")
// Will produce "I have a apple_ and a orange_"
To get the rest of the string, you can use Regex.Split:
Regex reg2 = new Regex(#"\{[A-z0-9]*\}");
var result = reg2.Split(data);
// will contain "I have a ", " and a ", "", you might want to remove ""
As I understand, you want to split that string into parts like this:
I have a
{apple}
and a
{orange}
And then you want to go over those parts and do something with them, and that something is different depending on whether part is enclosed in {} or not. If so - you need Regex.Split:
string data = "I have a {apple} and a {orange}";
var parts = Regex.Split(data, #"({.*?})");
foreach (var part in parts) {
if (part.StartsWith("{") && part.EndsWith("}")) {
var trimmed = part.TrimStart('{').TrimEnd('}');
// "apple" and "orange" go here
// do something with {} part
}
else {
// "I have a " and " and a " go here
// do something with other part
}
}
I am using a string list in c#, which contains a list of subjects.
E.g art, science, music.
I then have the user input "I would like to study science and art."
I would like to store the results into a variable, but I get lots of duplicates like "science, sciencemusic" (that's not a typo).
I think it's from the looping of the for each statement. Could there be an easier way to do this or is there something wrong in my code? I can't figure it out.
Here's my code:
string input = "I would like to study science and art.";
string result = "";
foreach (string sub in SubjectsClass.SubjectsList)
{
Regex rx = new Regex(sub, RegexOptions.IgnoreCase);
MatchCollection matches = rx.Matches(input);
foreach (Match match in matches)
{
result += match.Value;
}
}
The subjects class function "SubjectsList" is read from a CSV file with only words in it of random subjects:
CSV File:
Computing
English
Maths
Art
Science
Engineering
private list<string> subjects = new list<string>();
//Read data from csv file to list...
public list<string>SubjectsList
{
get { return subjects; }
{
Currently the output I get is this:
"input": "art science",
"Subject": "artscienceartscienceartscience"
If I change:
result += match.Value;
to
result += match.Value + " ";
I get lots of spaces.
edit: I should mention that this code runs on a WPF c# button press and then shows the result.
Using your code, and with the following test data:
List<string> subjects = new List<string>{"Science", "Art", "Maths"};
string input = "I would like to study science and art.";
I don't get duplicates.
To avoid blank matches, perform a check on the value being empty
foreach (Match match in matches)
{
if (!string.IsNullOrEmpty(match.Value))
{
result += match.Value + " ";
}
}
So I am coding a converter program that convers a old version of code to the new version you just put the old text in a text box and it converts Txt to Xml and im trying to get each items beetween two characters and below is the string im trying to split. I have put just the name of the param in the " " to protect my users credentials. So i want to get every part of code beetween the ","
["Id","Username","Cash","Password"],["Id","Username","Cash","Password"]
And then add each string to a list so it would be like
Item 1
["Id","Username","Cash","Password"]
Item 2
["Id","Username","Cash","Password"]
I would split it using "," but then it would mess up because there is a "," beetween the params of the string so i tried using "],"
string input = textBox1.Text;
string[] parts1 = input.Split(new string[] { "]," }, StringSplitOptions.None);
foreach (string str in parts1)
{
//Params is a list...
Params.Add(str);
}
MessageBox.Show(string.Join("\n\n", Params));
But it sort of take the ] of the end of each one. And it messes up in other ways
This looks like a great opportunity for Regular Expressions.
My approach would be to get the row parts first, then get the column parts. I'm sure there are about 30 ways to do this, but this is my (simplistic) approach.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var rowPattern = new Regex(#"(?<row>\[[^]]+\])", RegexOptions.Multiline | RegexOptions.ExplicitCapture);
var columnPattern = new Regex(#"(?<column>\"".+?\"")", RegexOptions.Multiline | RegexOptions.ExplicitCapture);
var data = "[\"Id\",\"Username\",\"Cash\",\"Password\"],[\"Id\",\"Username\",\"Cash\",\"Password\"]";
var rows = rowPattern.Matches(data);
var rowCounter = 0;
foreach (var row in rows)
{
Console.WriteLine("Row #{0}", ++rowCounter);
var columns = columnPattern.Matches(row.ToString());
foreach (var column in columns)
Console.WriteLine("\t{0}", column);
}
Console.ReadLine();
}
}
}
Hope this helps!!
You can use Regex.Split() together with positive lookbehind and lookahead to do this:
var parts = Regex.Split(input, "(?<=]),(?=\\[)");
Basically this says “split on , with ] right before it and [ right after it”.
Assuming that the character '|' does not occur in your original data, you can try:
input.Replace("],[", "]|[").Split(new char[]{'|'});
If the pipe character does occur, use another (non-occurring) character.