I'm trying to extract a value from a string using regex. The string looks like this:
<faultcode><![CDATA[900015The new password is not long enough. PasswordMinimumLength is 6.]]></faultcode>
I am trying to diplay only the error message to end user.
Since you probably want everything <![CDATA[ and ]]> this should fit:
<!\[CDATA\[(.+?)\]\]>
The only sensible thing is to load it into an XElement (or XDocument, XmlDocument) and extract the Value from the CDATA element.
XElement e = XElement.Parse(xmlSnippet);
string rawMsg = (e.FirstNode as XCData).Value;
string msg = rawMsg.Substring("900015".Length);
First, and foremost, using regex to parse XML / HTML is bad.
Now, by error message I assume you mean the text, not including the numbers. An expression like so would probably do the trick:
\<([^>]+)\><!\[CDATA\[\d*(.*)\]\]>\</\1\>
The error message will be in the second group. This will work with the sample that you have given, but I'd sooner use XDocument or XmlDocument to parse it. If you are using C#, there really isn't a good reason to not use either of those classes.
Updated to correspond with the question edit:
var xml = XElement.Parse(yourString);
var allText = xml.Value;
var stripLeadingNumbers = Regex.Match(xml.Value, #"^\d*(.*)").Groups[1].Value;
Related
I need to figure out a good way using C# to parse an XML file for (NULL) and remove it from the tags and replace it with the word BAD.
For example:
<GC5_(NULL) DIRTY="False"></GC5_(NULL)>
should be replaced with
<GC5_BAD DIRTY="False"></GC5_BAD>
Part of the problem is I have no control over the original XML, I just need to fix it once I receive it. The second problem is that the (NULL) can appear in zero, one, or many tags. It appears to be an issue with users filling in additional fields or not. So I might get
<GC5_(NULL) DIRTY="False"></GC5_(NULL)>
or
<MH_OTHSECTION_TXT_(NULL) DIRTY="False"></MH_OTHSECTION_TXT_(NULL)>
or
<LCDATA_(NULL) DIRTY="False"></LCDATA_(NULL)>
I am a newbie to C# and programming.
EDIT:
So I have come up with the following function that while not pretty, so far work.
public static string CleanInvalidXmlChars(string fileText)
{
List<char> charsToSubstitute = new List<char>();
charsToSubstitute.Add((char)0x19);
charsToSubstitute.Add((char)0x1C);
charsToSubstitute.Add((char)0x1D);
foreach (char c in charsToSubstitute)
fileText = fileText.Replace(Convert.ToString(c), string.Empty);
StringBuilder b = new StringBuilder(fileText);
b.Replace("", string.Empty);
b.Replace("", string.Empty);
b.Replace("<(null)", "<BAD");
b.Replace("(null)>", "BAD>");
Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
String result = nullMatch.Replace(b.ToString(), "<$1_BAD$2>");
result = result.Replace("(NULL)", "BAD");
return result;
}
I have only been able to find 6 or 7 bad XML files to test this code on, but it has worked on each of them and not removed good data. I appreciate the feedback and your time.
In general, regular expressions are not the right way of handling XML files. There's a range of solutions to handle XML files correctly - you can read up on System.Xml.Linq for a good start. If you're a newbie, it's certainly something you should learn at some point. As Ed Plunkett pointed out in the comments, though, your XML is not actually XML: ( and ) characters are not allowed in XML element names.
Since you will have to do it as an operation on a string, Corak's comment to use
contentOfXml.Replace("(NULL)", "BAD");
may be a good idea, but will break if any elements can contain the string (NULL) as anything other than their name.
If you want a regex approach, this might work decently, but I'm not sure if it's not missing any edge cases:
var regex = new Regex(#"(<\/?[^_]*_)\(NULL\)([^>]*>)");
var result = regex.Replace(contentOfXml, "$1BAD$2");
Will it be suitable for you to read this XML as a string and perform a regex replacement? Like:
Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
String processedXmlString = nullMatch.Replace(originalXmlString, "<$1_BAD$2>");
I'm using C# in reading an XML file and counting how many "elements" there are in an XML tag, like this for example...
<Languages>English, Deutsche, Francais</Languages>
there are 3 "elements" inside the Languages tag: English, Deutsche, and Francais . I need to know how to count them and return the value of how much elements there are. The contents of the tag have the possibility of changing over time, because the XML file has to expand/accommodate additional languages (whenever needed).
IF this is not possible, please do suggest workarounds for the problem. Thank you.
EDIT: I haven't come up with the code to read the XML file, but I'm also interested in learning how to.
EDIT 2: revisions made to question
string xml = #"<Languages>English, Deutsche, Francais</Languages>";
var doc = XDocument.Parse(xml);
string languages = doc.Elements("Languages").FirstOrDefault().Value;
int count = languages.Split(',').Count();
In response to your edits which indicate that you're not simply trying to pull out comma separated strings from an XML element, then your approach to storing the XML in the first place is incorrect. As another poster commented, it should be:
<Languages>
<Language>English</Language>
<Language>Deutsche</Language>
<Language>Francais</Language>
</Languages>
Then, to get the count of languages:
string xml = #"<Languages>
<Language>English</Language>
<Language>Deutsche</Language>
<Language>Francais</Language>
</Languages>";
var doc = XDocument.Parse(xml);
int count = doc.Element("Languages").Elements().Count();
First, an "ideal" solution: do not put more than one piece of information in a single tag. Rather, put each language in its own tag, like this:
<Languages>
<Language>English</Language>
<Language>Deutsche</Language>
<Language>Francais</Language>
</Languages>
If this is not possible, retrieve the content of the tag with multiple languages, split using allLanguages.Split(',', ' '), and obtain the count by checking the length of the resultant array.
Ok, but just to be clear, an XML Element has a very specific meaning. In fact, the entire codeblock you have is an XML Element.
XElement xElm = new XElement("Languages", "English, Deutsche, Francais");
string[] elements = xElm.Value.Split(",".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
I am reading an XML string with XDocument
XmlReader reader = XmlReader.Create(new StringReader(xmltext));
reader.Read();
XDocument xdoc = XDocument.Load(reader);
Then I grab the content of some tags and put them within tags in a different string.
When I try to Load this string in the same way I did with the first, I get an error "An error occurred while parsing EntityName. Line 1, position 344.".
I think it should be parsed correctly since it has beem parsed before so I guess I am missing something here.
I am reading and copying the content of the first XML with (string)i.Element("field").
I am using .net 4
When I grab the content of the xml that I want to use for building another Xml string I use (string)i.Element("field") and this is converting my Xml into string. My next Xml Parsing does not recognize it as an Element anymore so I solved the problem by not using (string) before I read my element, just i.Element("field") and this works.
It sounds like you've got something like this:
<OriginalDocument>
<Foo>A & B</Foo>
</OriginalDocument>
That A & B represents the text A & B. So when you grab the text from the element, you'll get the string "A & B". If you then use that to build a new element like this:
string foo = "<Foo>" + fooText + "</Foo>";
then you'll end up with invalid XML like this:
<Foo>A & B</Foo>
Basically, you shouldn't be constructing XML in text form. It's not clear what you're really trying to achieve, but you can copy an element from one place to another pretty easily in XElement form; you shouldn't need to build a string and then reparse it.
So after spending hours on this issue:
it turns out that if you have an ampersand symbol ("&") or any other XML escape characters within your xml string, it will always fail will you try read the XML.
TO solve this, replace the special characters with their escaped string format
YourXmlString = YourXmlString.Replace("'", "'").Replace("\"", """).Replace(">", ">").Replace("<", "<").Replace("&", "&");
I have an xml string that I wish to traverse using LINQ to XML (I have never used this, so wish to learn). However when I try to use
XDocument xDoc = XDocument.Load(adminUsersXML);
var users = from result in xDoc.Descendants("Result")
select new
{
test = result.Element("USER_ID").Value
};
I get an error message saying illegal characters in path. reading up on it, it's because I cannot pass a standard string in this way. Is there a way to use XML LINQ qith a standard string?
Thanks.
My guess is that adminUsersXML is the XML itself rather than a path to a file containing XML. If that's the case, just use:
XDocument doc = XDocument.Parse(adminUsersXML);
As said in MSDN, you must use the Parse function to create a XDocument from a string.
I think adminUserXML is not a file but a string containing xml, which should be parsed to convert to XDocument with XDocument.Parse(adminUserXML)
How can I replace a certain part in a xml file with a definied string?
<tag1></tag2>
<tag2></tag2>
...etc
<soundcard num=0>
<name>test123</name>
</soundcard>
<soundcard num=1>
<name>test123</name>
</soundcard>
<soundcard num=2>
<name>test123</name>
</soundcard>
<tag5></tag5>
replace all soundcard parts that the result looks like that:
<tag1></tag2>
<tag2></tag2>
...etc
{0}
<tag5></tag5>
I'm using c# .net 3.5 and I thougt of a regex solution
If it has to be a regex, your XML file is well-formed, and you know (say, from the DTD) that <soundcard> tags can't be nested, then you can use
(<soundcard.*?</soundcard>\s*)+
and replace all with {0}.
In C#:
resultString = Regex.Replace(subjectString, #"(<soundcard.*?</soundcard>\s*)+", "{0}", RegexOptions.Singleline);
For a quick-and-dirty fix to a one-off problem, I think that's OK. It's not OK to think of regex as the proper tool to handle XML in general.
Personally I would use Linq to XML and remove the entities and replace it with a Text Node.
Update Apr 16/2010 4:40PM MST
Here's an example of Linq to XML, I'm a bit rusty but it should at least give you an idea of how this is done.
XElement root = XElement.Load("myxml.xml");
var soundcards = select el from root.Elements() where el.Name == "soundcard" select el;
var prev_node = soundcards.First().PreviousNode;
// Remove Nodes
foreach(XElement card in soundcards)
card.Remove();
// Build your content here into a variable called newChild
prev_node.AddAfterSelf(newChild);
My suggestion would be to use an XSLT transformation to replace the tags you want to replace with a known tag, say , and then String.Replace('', '{0}');.
I echo what Johannes said, do NOT try to build REs to do this. As your XML gets more complex, you error rate will increase.