This is to my deleted question here:
https://stackoverflow.com/questions/20843964/need-regex-to-find-text-in-c-sharp
I have a string similar to this:
<Label Content="Hi"/>
<SomeControl Header ="welcome"/>
From the above string data, I want to get:
Content="Hi"
Header="welcome"
The regex expression can be combined or separate to get these two strings.
Before I edit the question, it got deleted. I wanted to make following edit:
By trying online regex testers, I managed to try following:
Content="[^"]*")
But when I put the same in C# string, I get error:
string expr = #"Content="[^"]*")";
I know that some string escape sequence needed. So wanted to find it. However I am not able to find it yet. Why I want to file such xml (XAML file) like this is I want to find number of occurrences of hard coded string. So I do not need any xml parsing or anything like that. Just plain simple regex to found count of such strings.
I understand your point.
You can definitely capture multiple results with a RegEx, when you're sure other ways aren't appropriate. (although xml is a pretty handy format)
Did you possibly mean to use .Match*es* at the end, though?
string expr = #"Content=""[^""]*""";
System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex(expr);
string data = #"<SomeControl Content=""sup""><anotherControl Content=""hey""><athird Content=""yo""></athird></anotherControl></SomeControl>"; // This will be replaced with actual file content
var res = reg.Matches(data);
var occuranceCount = res.Count;
Finally following worked for me:
string expr = #"Content=""[^""]*""";
Regex reg = new Regex(expr);
string data = #"<SomeControl Content=""Hi"">"; // This will be replaced with actual file content
var res = reg.Match(data);
var occuranceCount = res.Groups.Count;
Related
I need to figure out a good way using C# to parse an XML file for (NULL) and remove it from the tags and replace it with the word BAD.
For example:
<GC5_(NULL) DIRTY="False"></GC5_(NULL)>
should be replaced with
<GC5_BAD DIRTY="False"></GC5_BAD>
Part of the problem is I have no control over the original XML, I just need to fix it once I receive it. The second problem is that the (NULL) can appear in zero, one, or many tags. It appears to be an issue with users filling in additional fields or not. So I might get
<GC5_(NULL) DIRTY="False"></GC5_(NULL)>
or
<MH_OTHSECTION_TXT_(NULL) DIRTY="False"></MH_OTHSECTION_TXT_(NULL)>
or
<LCDATA_(NULL) DIRTY="False"></LCDATA_(NULL)>
I am a newbie to C# and programming.
EDIT:
So I have come up with the following function that while not pretty, so far work.
public static string CleanInvalidXmlChars(string fileText)
{
List<char> charsToSubstitute = new List<char>();
charsToSubstitute.Add((char)0x19);
charsToSubstitute.Add((char)0x1C);
charsToSubstitute.Add((char)0x1D);
foreach (char c in charsToSubstitute)
fileText = fileText.Replace(Convert.ToString(c), string.Empty);
StringBuilder b = new StringBuilder(fileText);
b.Replace("", string.Empty);
b.Replace("", string.Empty);
b.Replace("<(null)", "<BAD");
b.Replace("(null)>", "BAD>");
Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
String result = nullMatch.Replace(b.ToString(), "<$1_BAD$2>");
result = result.Replace("(NULL)", "BAD");
return result;
}
I have only been able to find 6 or 7 bad XML files to test this code on, but it has worked on each of them and not removed good data. I appreciate the feedback and your time.
In general, regular expressions are not the right way of handling XML files. There's a range of solutions to handle XML files correctly - you can read up on System.Xml.Linq for a good start. If you're a newbie, it's certainly something you should learn at some point. As Ed Plunkett pointed out in the comments, though, your XML is not actually XML: ( and ) characters are not allowed in XML element names.
Since you will have to do it as an operation on a string, Corak's comment to use
contentOfXml.Replace("(NULL)", "BAD");
may be a good idea, but will break if any elements can contain the string (NULL) as anything other than their name.
If you want a regex approach, this might work decently, but I'm not sure if it's not missing any edge cases:
var regex = new Regex(#"(<\/?[^_]*_)\(NULL\)([^>]*>)");
var result = regex.Replace(contentOfXml, "$1BAD$2");
Will it be suitable for you to read this XML as a string and perform a regex replacement? Like:
Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
String processedXmlString = nullMatch.Replace(originalXmlString, "<$1_BAD$2>");
I will have always an string like this:
"/FirstWord/ImportantWord/ThirdWord"
How can I extract the ImportantWord? Words can contain at most one space and they are separated by forward slashlike I put above, for example:
"/Folder/Second Folder/Content"
"/Main folder/Important/Other Content"
I always want to get the second word(Second Folder and Important considering above examples)
how about this:
string ImportantWord = path.Split('/')[2]; // Index 2 will give the required word
I hope you need not to use the String.Split option either with specific characters or with some regular expressions. Since the inputs are well qualified paths to a directory you can use Directory.GetParent method of the System.IO.Directory class, which will give you the parent Directory as DirectoryInfo. From that you can take the Name of Directory which will be the required text.
You can use like this :
string pathFirst = "/Folder/Second Folder/Content";
string pathSecond = "/Main folder/Important/Other Content";
string reqWord1 = Directory.GetParent(pathFirst ).Name; // will give you Second Folder
string reqWord2 = Directory.GetParent(pathSecond).Name; // will give you Important
Additional note: The method Directory.GetParent can be nested if you need to get a name in another level.
Also you may try this:
var stringValue = "/FirstWord/ImportantWord/ThirdWord";
var item = stringValue.Split('/').Skip(2).First(); //item: ImportantWord
There are several ways to solve this. The simplest one is using String.split
Char delimiter = '/';
String[] substrings = value.Split(delimiter);
String secondWord = substrings[1];
(you may want to do some input check to make sure the input is in the right format or else you will get some exception)
Other way is using regex when the pattern is simple /
If you are sure this is a path you can use other answer mention here
I am trying to parse through some log files and put them into a database for analysis. A single line looks something like this:
2012-09-30 17:16:27,213 [39] (boxes) ERROR Assembly.Places [(null)] - Error while displaying a thing
I have made a regular expression that works well for pulling out the date in front and breaking up the lines that way, but I lose the date itself. This is a pretty important bit of data, and I don't want to lose it!
I cannot just do this by \r\n, because some logs are fatal errors that include stack traces for the developers. Those, obviously, use \r\n to make them readable.
My current code looks like this for reference:
var logpath = Directory.GetFiles(#"C:\a\directory", "*.log");
foreach (var log in logpath)
{
var fileStream = new StreamReader(log);
var fileString = fileStream.ReadToEnd();
var records = Regex.Split(fileString, "[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}");
...
}
Split() will always remove the matched delimiter. The trick is not to match any actual text, but rather a position in the string.
This is done through zero-width look-ahead:
var datePattern = "^(?=[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})";
var datePositions = new Regex(datePattern, RegexOptions.Multiline);
// ...
Regex.Split(fileString, datePositions);
You should match instead of splitting
This is the regex.Use singleLine Mode
([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})(.*?)((?=[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}|$))
Group 1 contains date
Group 2 contains the required date
NOTE
The regex is conceptually like this.
(yourDate)(.*?yourdata)(?=till the other date|$)
Dont forget to use singlelineMode
Well, I'm not an expert on the subject but I did found this: Regex.Match.
From what I see you can receive the first match of the date format with a Match object
which has all kind of nice properties that put together you can probably cut the parts you want.
p.s. also exists a Regex.Matches which will return all matches in the file, might be easier for use.
Sorry I don't have time for to find a complete code example.
good day
I have the following string:
http://www.powerwXXe.com/text1 123-456 text2 text3/
Can someone give me advice on how to get the value of text1, text2 and text3 and put them into a string. I have heard of regular expressions but have no idea how to use them.
Instead of going the RegEx route, if you know that the string will always be of a similar format, you can using string.Split, first on /, then on space and retrieve the results from the resulting string arrays.
string[] slashes = myString.Split('/');
string[] textVals = slashes[3].Split(' ');
// at this point:
// textVals[0] = "text1"
// textVals[1] = "123-456"
// textVals[2] = "text2"
// textVals[3] = "text3"
Here is a link on getting started with regular expressions in C#:Regular Expression Tutorial
I don't think it is appropriate to write out a tutorial here since the information is online, so please check out the link and let me know if you have a specific question.
Instead of using regex, you can use string.Fromat("http://myurl.com/{0}{1}{2}", value1, textbox2.Text, textbox3.Text) and format the url in whatever fashion. If you are looking to go the regex route, you can always check regexlib.
The use of regular expressions relies on patterns you see in your strings - you need to be able to generalize the pattern of strings you're looking for before you can use a regular expression.
For a problem of this scope, if you can pin down the pattern, you're probably better off using other string parsing methods, such as String.IndexOf and String.Split.
Regular expressions is a powerful tool, and certainly worth learning, but it might not be necessary here.
Based on the example you gave, it looks as though text1, text2 and text3 are separated by spaces? If so, and if you always know the positions they'll be in, you may want to skip regular expressions and just use .Split(' ') to split the string into an array of strings and then grab the pertinent items from there. Something like this:
string foo = "http://www.powerwXXe.com/text1 123-456 text2 text3/"
string[] fooParts = foo.Split(' ');
string text1 = fooParts[0].Replace("http://www.powerwXXe.com/", "");
string text2 = fooParts[2];
string text3 = fooParts[3].Replace("/", "");
You'd want to perform bounds checking on the string[] before trying to grab anything from it, but this would work. Regex is awesome for string parsing, but when it's simple stuff you need to do, sometimes it's overkill when simple methods from the string class will do.
It all depends on how much you know about about the string you are parsing. Where does the string come from and how much do you know about it's formating?
Based on your example string you could get away with something as simple as
string pattern = #"http://www.powerwXXe.com/(?<myGroup1>\S+)\s\S+\s(?<myGroup2>\S+)\s(?<myGroup3>\S+)/";
var reg = new System.Text.RegularExpressions.Regex(pattern);
string input = "http://www.powerwXXe.com/text1 123-456 text2 text3/";
System.Text.RegularExpressions.Match myMatch = reg.Match(input);
The caputerd strings would then be contained in myMatch.Groups["myGroup1"], ["myGroup2"], ["myGroup3"] respectivly.
This however assumes that your string always begins with http://www.powerwXXe.com/, that there will always be three groups to capture and that the groups are separated by a space (which is an illegal character in url's and would in almost all cases be converted to %20, which would have to be accounted for in the pattern).
So, how much do you know about your string? And, as some has already stated, do you really need regular expressions?
I need some help extracting the following bits of information using regular expressions.
Here is my input string "C:\Yes"
******** Missing character at start of string and in between but not at the end =
a weird superscript looking L.***
I need to extract "C:\" into one string and "Yes" into another.
Thanks In Advance.
I wouldn't bother with regular expressions for that. Too much work, and I'd be too likely to screw it up.
var x = #"C:\Yes";
var root = Path.GetPathRoot(x); // => #"C:\"
var file = Path.GetFileName(x); // => "Yes"
The following regular expression returns C:\ in the first capture group and the rest in the second:
^(\w:\\)(.*)$
This is looking for: a full string (^…$) starting with a letter (\w, although [a-z] would probably more accurate for Windows drive letters), followed by :\. All the rest (.*) is captured in the second group.
Notice that this won’t work with UNC paths. If you’re working with paths, your best bet is not to use strings and regular expressions but rather the API found in System.IO. The classes found there already offer the functionality that you want.
Regex r = new Regex("([A-Z]:\\)([A-Za-z]+)");
Match m = r.Match(#"C:\");
string val1 = m.Groups[0];
string val2 = m.Groups[1];