Find and replace question regarding RegEx.Replace - c#

I have a text file and I want to be able to change all instances of:
T1M6 to N1T1M6
The T will always be a different value depending on the text file loaded. So example it could sometimes be
T2M6 and that would need to be turned into N2T2M6. The N(value) must match the T(value). The M6 will always be M6.
Another example:
T9M6 would translate to N9T9M6
Here is my code to do the loading of the text file:
StreamReader reader = new StreamReader(fDialog.FileName.ToString());
string content = reader.ReadToEnd();
reader.Close();
Here is RegEx.Replace statement that I came up with. Not sure if it is right.
content = Regex.Replace(content, #"(T([-\d.]))M6", "N1$1M6");
It seems to work at searching for T5M6 and turning it into N1T5M6.
But I am unsure how to turn the N(value) into the value that T is. For example N5T5M6.
Can someone please show me how to do modify my code to handle this?
Thanks.

Like this:
string content = File.ReadAllText(fDialog.FileName.ToString());
content = Regex.Replace(content, #"T([-\d.])M6", "N$1T$1M6");
Also, you should probably replace [-\d.] with \d or -?\d\.?

Related

c# remove (null) from XML tags

I need to figure out a good way using C# to parse an XML file for (NULL) and remove it from the tags and replace it with the word BAD.
For example:
<GC5_(NULL) DIRTY="False"></GC5_(NULL)>
should be replaced with
<GC5_BAD DIRTY="False"></GC5_BAD>
Part of the problem is I have no control over the original XML, I just need to fix it once I receive it. The second problem is that the (NULL) can appear in zero, one, or many tags. It appears to be an issue with users filling in additional fields or not. So I might get
<GC5_(NULL) DIRTY="False"></GC5_(NULL)>
or
<MH_OTHSECTION_TXT_(NULL) DIRTY="False"></MH_OTHSECTION_TXT_(NULL)>
or
<LCDATA_(NULL) DIRTY="False"></LCDATA_(NULL)>
I am a newbie to C# and programming.
EDIT:
So I have come up with the following function that while not pretty, so far work.
public static string CleanInvalidXmlChars(string fileText)
{
List<char> charsToSubstitute = new List<char>();
charsToSubstitute.Add((char)0x19);
charsToSubstitute.Add((char)0x1C);
charsToSubstitute.Add((char)0x1D);
foreach (char c in charsToSubstitute)
fileText = fileText.Replace(Convert.ToString(c), string.Empty);
StringBuilder b = new StringBuilder(fileText);
b.Replace("", string.Empty);
b.Replace("", string.Empty);
b.Replace("<(null)", "<BAD");
b.Replace("(null)>", "BAD>");
Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
String result = nullMatch.Replace(b.ToString(), "<$1_BAD$2>");
result = result.Replace("(NULL)", "BAD");
return result;
}
I have only been able to find 6 or 7 bad XML files to test this code on, but it has worked on each of them and not removed good data. I appreciate the feedback and your time.
In general, regular expressions are not the right way of handling XML files. There's a range of solutions to handle XML files correctly - you can read up on System.Xml.Linq for a good start. If you're a newbie, it's certainly something you should learn at some point. As Ed Plunkett pointed out in the comments, though, your XML is not actually XML: ( and ) characters are not allowed in XML element names.
Since you will have to do it as an operation on a string, Corak's comment to use
contentOfXml.Replace("(NULL)", "BAD");
may be a good idea, but will break if any elements can contain the string (NULL) as anything other than their name.
If you want a regex approach, this might work decently, but I'm not sure if it's not missing any edge cases:
var regex = new Regex(#"(<\/?[^_]*_)\(NULL\)([^>]*>)");
var result = regex.Replace(contentOfXml, "$1BAD$2");
Will it be suitable for you to read this XML as a string and perform a regex replacement? Like:
Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
String processedXmlString = nullMatch.Replace(originalXmlString, "<$1_BAD$2>");

C# HTML scraping between tags

Okay so I'm trying to do a Skype tool which would have a "dictionary" command which would retrieve the meaning of the word from urban dictionary at the moment I'm able to load the whole HTML document in to string like this:
private void urbanDictionary(string term)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.urbandictionary.com/define.php?term=" + term);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader stream = new StreamReader(response.GetResponseStream());
string final_response = stream.ReadToEnd();
MessageBox.Show(final_response);
}
The problem is that I only want the meaning which is like so
<div class='meaning'> "meaning" </div>
I have tried all kinds of stuff but i cant manage to retrieve the text between "div" tags.
How could i do this?
Use the HtmlAgilityPack library, exactly what you need.
http://www.codeproject.com/Articles/659019/Scraping-HTML-DOM-elements-using-HtmlAgilityPack-H
I can suggest, in final_response string first find then add create a substring from that index+"div class='meaning'".length to end of string. After in that substring find index position of "" and use this again to find another substring having text inbetween div tag.
Example.
IF you get at index 100 then create substring using 100+38 to end.
This substring will like "meaning" .
Again find index position of lets assume that it is 10 then find substring from 0 to (10 -1) this will give output as meaning
Maybe not the answer you're looking for. But I used https://www.mashape.com to get an API for urban dictionary. Unfortunatly it's unofficial, so I don't know for how long this will work. But as comments already mentioned, the html could also always change - most likely more often than an API. Also the API consumes less bandwidth, which should always preferred.
Usage would be
var client = new WebClient();
client.Headers.Add("X-Mashape-Key", "APIKEY");
client.Headers.Add("Accept", "text/plain");
Console.WriteLine(client.DownloadString("https://mashape-community-urban-dictionary.p.mashape.com/define?term="+ term));
There are two options.
1) You can use Regex to remove the HTML tags. This is short and sweet and you can use it if the HTML source you are dealing with is not complex.
string meaningStr = Regex.Replace(final_response, #"<[^>]+>", "").Trim();
You can find the above solution tested live at: regexstorm.net/tester
2) You can use HTMLAgilityPack . This method is recommended but needs you to expend some effort setting it up. With Nuget, it's not that difficult.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(final_response);
final_response = doc.InnerText;

How to extract string from java properties

Here is the java properties content
xxx_error_tx1 = This is xxxx. Johe say:
xxx_error_MapCode = xxx_error_tx1, test this function,Failed,\
Default, Current,\
App_Error_tx1
I need to extract string ID and string content, I can extract line1 content correctly, but the second line content extract only the first string xxx_error_tx1, test this function,Failed,\. The rest of string cannot extract.
The regex string is (?<ID>.+?)=(?<Translation>.+?)$, I know this regex have some problem, but I've tried to modify to correct pattern but maybe I am newbie, the result still cannot meet my request.
Any help would be appreciated.
Seems like you want something like this,
(?<ID>.+?)=(?<Translation>(?:(?!\S+\s*=)[\s\S])+)
DEMO
(?:(?!\S+\s*=)[\s\S])+ Matches one or more space or non-space characters which won't contain the string which was matched by this \S+\s*= pattern.
Try this, it correctly include the whole value when the value is splited on multiple lines but stop before line that follow.
(?<ID>.+?)=(?<Translation>(?:.*\\\s)*.*)
DEMO

How to select something within an XML attribute?

I am currently attempting to replace a certain string in an xml document. I am doing this through Visual Studio using C#. The exact string I want to replace is Data Source = some-host to Data Source = local-host. The string is located under an attribute to my Strings. However, the attribute connectionString has many values under it.
<Strings>
<add name="Cimbrian.Data.ConnectionString" connectionString="Data Source=some-host;Integrated Security=false;pooling=true;Min Pool Size=5;Max Pool Size=400;Connection Timeout=5;"/>
I have managed to be able to select and replace the entire values for both name and connectionString however I want to be able to select JUST the Data Source = some-host to replace.
After loading the document my code currently looks like this,
XmlNode ConnectNode = Incident.SelectSingleNode("//Strings");
XmlNode add1 = ConnectNode.FirstChild;
add1.Attributes[1].Value = "THIS REPLACES ALL OF CONNECTION STRING";
But as the string value suggests, it is replacing far more than I want it to. Any help would be appreciated. Apologies if that is slightly hard to follow.
EDIT - I forgot to mention that if possible I want to do this without searching for the specific string Data Source = some-host due to the fact that the some-host part may change, and I still want to be able to edit the value without having to change my code.
This has really nothing to do with XML - the fact that the value of the attribute is itself a semi-colon-separated list is irrelevant as far as XML is concerned. You'd have the same problem if you had the connection string on its own.
You can use SqlConnectionStringBuilder to help though:
var builder = new SqlConnectionStringBuilder(currentConnectionString);
builder.DataSource = "some other host";
string newConnectionString = builder.ToString();
This means you don't need to rely on the current exact value of some-host (and spacing) which you will do if you just use string.Replace.
If you know exactly what you would be replacing you could use the replace method:
string string2 = string1.Replace("x", "y");
This would find all instances of x and replace them with y in string1
EDIT:
Your specific code would look something like this:
add1.Attributes[1].Value = add1.Attributes[1].Value.Replace("Data Source = some-host","Data Source = local-host");
EDIT 2:
Okay based on your comment I would then split the string on the semi-colon and then iterate to find the DataSource string and modify it and then concatenate everything back together

Match.Value and international characters

UPDATE May this post be helpful for coders using RichTextBoxes. The Match is correct for a normal string, I did not see this AND I did not see that "ä" transforms to "\e4r" in the richTextBox.Rtf! So the Match.Value is correct - human error.
A RegEx finds the correct text but Match.Value is wrong because it replaces the german "ä" with "\'e4"!
Let example_text = "Primär-ABC" and lets use the following code
String example_text = "<em>Primär-ABC</em>";
Regex em = new Regex(#"<em>[^<]*</em>" );
Match emMatch = em.Match(example_text); //Works!
Match emMatch = em.Match(richtextBox.RTF); //Fails!
while (emMatch.Success)
{
string matchValue = emMatch.Value;
Foo(matchValue) ...
}
then the emMatch.Value returns "Prim\'e4r-ABC" instead of "Primär-ABC".
The German ä transforms to \'e4!
Because I want to work with the exact string, i would need
emMatch.Value to be Primär-ABC - how do I achieve that?
In what context are you doing this?
string example_text = "<em>Ich bin ein Bärliner</em>";
Regex em = new Regex(#"<em>[^<]*</em>" );
Match emMatch = em.Match(example_text);
while (emMatch.Success)
{
Console.WriteLine(emMatch.Value);
emMatch = emMatch.NextMatch();
}
This outputs <em>Ich bin ein Bärliner</em> in my console
The problem probably isn't that you're getting the wrong value back, it's that you're getting a representation of the value that isn't displayed correctly. This can depend on a lot of things. Try writing the value to a text file using UTF8 encoding and see if it still is incorrect.
Edit: Right. The thing is that you are getting the text from a WinForms RichTextBox using the Rtf property. This will not return the text as is, but will return the RTF representation of the text. RTF is not plain text, it's a markup format to display rich text. If you open an RTF document in e.g. Notepad you will see that it has a lot of weird codes in it - including \'e4 for every 'ä' in your RTF document. If you would've used some markup (like bold text, color etc) in the RTF box, the .Rtf property would return that code as well, looking something like {\rtlch\fcs1 \af31507 \ltrch\fcs0 \cf6\insrsid15946317\charrsid15946317 test}
So use the .Text property instead. It will return the actual plain text.

Categories