XML Schema for Sitemap binding it together - c#

I have the following code for defining XML Schema. Having a problem in keeping the lines together. Worked fine before.
public static FileContentResult WriteTo(SiteMapFeed feedToFormat)
{
var siteMap = feedToFormat;
//TODO: DO something, next codes are just DEMO
var header = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
+ Environment.NewLine + "<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\""
+ Environment.NewLine + "xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\""
+ Environment.NewLine + "xsi:schemaLocation=\""
+ Environment.NewLine + "http://www.sitemaps.org/schemas/sitemap/0.9"
+ Environment.NewLine + "http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd\">";
var urls = new System.Text.StringBuilder();
foreach (var site in siteMap.Items)
{
urls.Append(string.Format("<url><loc>http://www.{0}/</loc><lastmod>{1}</lastmod><changefreq>{2}</changefreq><priority>{3}</priority></url>", site.Url, site.LastMod, site.ChangeFreq, site.Priority));
}
byte[] fileContent = System.Text.Encoding.UTF8.GetBytes(header + urls + "</urlset>");
return new FileContentResult(fileContent, "text/xml");
}
SO this is now causing the following error:
Where am I doing it wrong? Thanks

Problem is in that "xsi:schemaLocation" bit I think - you can't have multiple lines between quotes in an XML attribute, if my brain remembers correctly. Try changing to:
var header = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
+ Environment.NewLine + "<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\""
+ Environment.NewLine + "xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\""
+ Environment.NewLine + "xsi:schemaLocation=\""
+ "http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd\">" + Environment.NewLine;

Related

Parse XML With Additional String

I need to support parsing xml that is inside an email body but with extra text in the beginning and the end.
I've tried the HTML agility pack but this does not remove the non-xml texts.
So how do I cleanse the string w/c contains an entire xml text mixed with other texts around it?
var bodyXmlPart= #"Hi please see below client <?xml version=""1.0"" encoding=""UTF-8""?>" +
"<ac_application>" +
" <primary_applicant_data>" +
" <first_name>Ross</first_name>" +
" <middle_name></middle_name>" +
" <last_name>Geller</last_name>" +
" <ssn>123456789</ssn>" +
" </primary_applicant_data>" +
"</ac_application> thank you, \n john ";
//How do I clean up the body xml part before loading into xml
//This will fail:
var xDoc = XDocument.Parse(bodyXmlPart);
If you mean that body can contain any XML and not just ac_application. You can use the following code:
var bodyXmlPart = #"Hi please see below client " +
"<ac_application>" +
" <primary_applicant_data>" +
" <first_name>Ross</first_name>" +
" <middle_name></middle_name>" +
" <last_name>Geller</last_name>" +
" <ssn>123456789</ssn>" +
" </primary_applicant_data>" +
"</ac_application> thank you, \n john ";
StringBuilder pattern = new StringBuilder();
Regex regex = new Regex(#"<\?xml.*\?>", RegexOptions.Singleline);
var match = regex.Match(bodyXmlPart);
if (match.Success) // There is an xml declaration
{
pattern.Append(#"<\?xml.*");
}
Regex regexFirstTag = new Regex(#"\s*<(\w+:)?(\w+)>", RegexOptions.Singleline);
var match1 = regexFirstTag.Match(bodyXmlPart);
if (match1.Success) // xml has body and we got the first tag
{
pattern.Append(match1.Value.Trim().Replace(">",#"\>" + ".*"));
string firstTag = match1.Value.Trim();
Regex regexFullXmlBody = new Regex(pattern.ToString() + #"<\/" + firstTag.Trim('<','>') + #"\>", RegexOptions.None);
var matchBody = regexFullXmlBody.Match(bodyXmlPart);
if (matchBody.Success)
{
string xml = matchBody.Value;
}
}
This code can extract any XML and not just ac_application.
Assumptions are, that the body will always contain XML declaration tag.
This code will look for XML declaration tag and then find first tag immediately following it. This first tag will be treated as root tag to extract entire xml.
I'd probably do something like this...
using System.Diagnostics;
using System.Text.RegularExpressions;
namespace Test {
class Program {
static void Main(string[] args) {
var bodyXmlPart = #"Hi please see below client <?xml version=""1.0"" encoding=""UTF-8""?>" +
"<ac_application>" +
" <primary_applicant_data>" +
" <first_name>Ross</first_name>" +
" <middle_name></middle_name>" +
" <last_name>Geller</last_name>" +
" <ssn>123456789</ssn>" +
" </primary_applicant_data>" +
"</ac_application> thank you, \n john ";
Regex regex = new Regex(#"(?<pre>.*)(?<xml>\<\?xml.*</ac_application\>)(?<post>.*)", RegexOptions.Singleline);
var match = regex.Match(bodyXmlPart);
if (match.Success) {
Debug.WriteLine($"pre={match.Groups["pre"].Value}");
Debug.WriteLine($"xml={match.Groups["xml"].Value}");
Debug.WriteLine($"post={match.Groups["post"].Value}");
}
}
}
}
This outputs...
pre=Hi please see below client
xml=<?xml version="1.0" encoding="UTF-8"?><ac_application> <primary_applicant_data> <first_name>Ross</first_name> <middle_name></middle_name> <last_name>Geller</last_name> <ssn>123456789</ssn> </primary_applicant_data></ac_application>
post= thank you,
john

Can't deserialize xml with element starting with 'μ'

I have an xml configuration file:
<Instruments>
<Instrument Name="uEyeFF1" Assembly="Instruments" Class="IDSuEye1240SE">
<Settings>
<IDSuEye1240SESettings>
<Serial>4102801225</Serial>
<µmPerPixel>5.3</µmPerPixel>
<Color>true</Color>
</IDSuEye1240SESettings>
</Settings>
</Instrument>
<!-- more Instruments -->
</Instruments>
and a class to which the <IDSuEye1240SESettings> node is deserialized:
[Serializable]
public class IDSuEye1240SESettings
{
[XmlElement]
public string Serial { get; set; }
[XmlElement]
public double µmPerPixel { get; set; }
[XmlElement]
public bool Color { get; set; }
}
But when deserializing, I get the following error:
'There is an error in XML document (69, 10).' in Utilities.Load() as System.Void
at path hidden 'Name cannot begin with the 'µ' character, hexadecimal value 0xB5. Line 69, position 10.' in System.Xml.Throw() as System.Void
rest of stack trace...
On a different PC, an earlier compiled version of the same application is running, and I am seemingly able to deserialize the xml to the class. But the developer who wrote it is not around.
As far as I know, the working code doing the deserialization should be the same as the current code:
// the <Settings> node's children is value below
_settings = ConvertNode<IDSuEye1240SESettings>(((XmlNode[])value).First())
public T ConvertNode<T>(XmlNode node)
{
MemoryStream ms = new MemoryStream();
StreamWriter sw = new StreamWriter(ms);
sw.Write(node.OuterXml);
sw.Flush();
ms.Position = 0;
XmlSerializer ser = new XmlSerializer(typeof(T));
T result = (T)ser.Deserialize(ms);
return result;
}
Is it possible that there is something different in the compiled code which works to allow the 'µ' character to start an xml element name?
I had found a difference in the first line of the xml files between the PC's:
Working:
<?xml version="1.0"?>
<Instruments
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation="http://www.w3schools.com Instruments.xsd">
Not working:
<?xml version="1.0" encoding="utf-8"?>
<Instruments
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation="http://www.w3schools.com Instruments.xsd">
But I removed the encoding attribute but that only removed the red underline below the 'µ' for the new file open in the IDE. It didn't solve the serialization issue.
The schema files are identical for both PCs, if that matters.
Is it possible to allow the illegal character to start an xml node?
I decompiled System.Xml to find out why this exception might occur and I think I found an answer...
In short
There is no way to crack into it. But it is possible to solve your problem without changing XML.
In long
Unfotunetly XML fifth edition is considerably differs in deifinition of characters which can be used to start tag names. Here is decompiled constants from System.Xml which proves my words:
#if XML10_FIFTH_EDITION
// StartNameChar without ':' -- see Section 2.3 production [4]
const string s_NCStartName =
"\u0041\u005a\u005f\u005f\u0061\u007a\u00c0\u00d6" +
"\u00d8\u00f6\u00f8\u02ff\u0370\u037d\u037f\u1fff" +
"\u200c\u200d\u2070\u218f\u2c00\u2fef\u3001\ud7ff" +
"\uf900\ufdcf\ufdf0\ufffd";
// NameChar without ':' -- see Section 2.3 production [4a]
const string s_NCName =
"\u002d\u002e\u0030\u0039\u0041\u005a\u005f\u005f" +
"\u0061\u007a\u00b7\u00b7\u00c0\u00d6\u00d8\u00f6" +
"\u00f8\u037d\u037f\u1fff\u200c\u200d\u203f\u2040" +
"\u2070\u218f\u2c00\u2fef\u3001\ud7ff\uf900\ufdcf" +
"\ufdf0\ufffd";
#else
const string s_NCStartName =
"\u0041\u005a\u005f\u005f\u0061\u007a" +
"\u00c0\u00d6\u00d8\u00f6\u00f8\u0131\u0134\u013e" +
"\u0141\u0148\u014a\u017e\u0180\u01c3\u01cd\u01f0" +
"\u01f4\u01f5\u01fa\u0217\u0250\u02a8\u02bb\u02c1" +
"\u0386\u0386\u0388\u038a\u038c\u038c\u038e\u03a1" +
"\u03a3\u03ce\u03d0\u03d6\u03da\u03da\u03dc\u03dc" +
"\u03de\u03de\u03e0\u03e0\u03e2\u03f3\u0401\u040c" +
"\u040e\u044f\u0451\u045c\u045e\u0481\u0490\u04c4" +
"\u04c7\u04c8\u04cb\u04cc\u04d0\u04eb\u04ee\u04f5" +
"\u04f8\u04f9\u0531\u0556\u0559\u0559\u0561\u0586" +
"\u05d0\u05ea\u05f0\u05f2\u0621\u063a\u0641\u064a" +
"\u0671\u06b7\u06ba\u06be\u06c0\u06ce\u06d0\u06d3" +
"\u06d5\u06d5\u06e5\u06e6\u0905\u0939\u093d\u093d" +
"\u0958\u0961\u0985\u098c\u098f\u0990\u0993\u09a8" +
"\u09aa\u09b0\u09b2\u09b2\u09b6\u09b9\u09dc\u09dd" +
"\u09df\u09e1\u09f0\u09f1\u0a05\u0a0a\u0a0f\u0a10" +
"\u0a13\u0a28\u0a2a\u0a30\u0a32\u0a33\u0a35\u0a36" +
"\u0a38\u0a39\u0a59\u0a5c\u0a5e\u0a5e\u0a72\u0a74" +
"\u0a85\u0a8b\u0a8d\u0a8d\u0a8f\u0a91\u0a93\u0aa8" +
"\u0aaa\u0ab0\u0ab2\u0ab3\u0ab5\u0ab9\u0abd\u0abd" +
"\u0ae0\u0ae0\u0b05\u0b0c\u0b0f\u0b10\u0b13\u0b28" +
"\u0b2a\u0b30\u0b32\u0b33\u0b36\u0b39\u0b3d\u0b3d" +
"\u0b5c\u0b5d\u0b5f\u0b61\u0b85\u0b8a\u0b8e\u0b90" +
"\u0b92\u0b95\u0b99\u0b9a\u0b9c\u0b9c\u0b9e\u0b9f" +
"\u0ba3\u0ba4\u0ba8\u0baa\u0bae\u0bb5\u0bb7\u0bb9" +
"\u0c05\u0c0c\u0c0e\u0c10\u0c12\u0c28\u0c2a\u0c33" +
"\u0c35\u0c39\u0c60\u0c61\u0c85\u0c8c\u0c8e\u0c90" +
"\u0c92\u0ca8\u0caa\u0cb3\u0cb5\u0cb9\u0cde\u0cde" +
"\u0ce0\u0ce1\u0d05\u0d0c\u0d0e\u0d10\u0d12\u0d28" +
"\u0d2a\u0d39\u0d60\u0d61\u0e01\u0e2e\u0e30\u0e30" +
"\u0e32\u0e33\u0e40\u0e45\u0e81\u0e82\u0e84\u0e84" +
"\u0e87\u0e88\u0e8a\u0e8a\u0e8d\u0e8d\u0e94\u0e97" +
"\u0e99\u0e9f\u0ea1\u0ea3\u0ea5\u0ea5\u0ea7\u0ea7" +
"\u0eaa\u0eab\u0ead\u0eae\u0eb0\u0eb0\u0eb2\u0eb3" +
"\u0ebd\u0ebd\u0ec0\u0ec4\u0f40\u0f47\u0f49\u0f69" +
"\u10a0\u10c5\u10d0\u10f6\u1100\u1100\u1102\u1103" +
"\u1105\u1107\u1109\u1109\u110b\u110c\u110e\u1112" +
"\u113c\u113c\u113e\u113e\u1140\u1140\u114c\u114c" +
"\u114e\u114e\u1150\u1150\u1154\u1155\u1159\u1159" +
"\u115f\u1161\u1163\u1163\u1165\u1165\u1167\u1167" +
"\u1169\u1169\u116d\u116e\u1172\u1173\u1175\u1175" +
"\u119e\u119e\u11a8\u11a8\u11ab\u11ab\u11ae\u11af" +
"\u11b7\u11b8\u11ba\u11ba\u11bc\u11c2\u11eb\u11eb" +
"\u11f0\u11f0\u11f9\u11f9\u1e00\u1e9b\u1ea0\u1ef9" +
"\u1f00\u1f15\u1f18\u1f1d\u1f20\u1f45\u1f48\u1f4d" +
"\u1f50\u1f57\u1f59\u1f59\u1f5b\u1f5b\u1f5d\u1f5d" +
"\u1f5f\u1f7d\u1f80\u1fb4\u1fb6\u1fbc\u1fbe\u1fbe" +
"\u1fc2\u1fc4\u1fc6\u1fcc\u1fd0\u1fd3\u1fd6\u1fdb" +
"\u1fe0\u1fec\u1ff2\u1ff4\u1ff6\u1ffc\u2126\u2126" +
"\u212a\u212b\u212e\u212e\u2180\u2182\u3007\u3007" +
"\u3021\u3029\u3041\u3094\u30a1\u30fa\u3105\u312c" +
"\u4e00\u9fa5\uac00\ud7a3";
const string s_NCName =
"\u002d\u002e\u0030\u0039\u0041\u005a\u005f\u005f" +
"\u0061\u007a\u00b7\u00b7\u00c0\u00d6\u00d8\u00f6" +
"\u00f8\u0131\u0134\u013e\u0141\u0148\u014a\u017e" +
"\u0180\u01c3\u01cd\u01f0\u01f4\u01f5\u01fa\u0217" +
"\u0250\u02a8\u02bb\u02c1\u02d0\u02d1\u0300\u0345" +
"\u0360\u0361\u0386\u038a\u038c\u038c\u038e\u03a1" +
"\u03a3\u03ce\u03d0\u03d6\u03da\u03da\u03dc\u03dc" +
"\u03de\u03de\u03e0\u03e0\u03e2\u03f3\u0401\u040c" +
"\u040e\u044f\u0451\u045c\u045e\u0481\u0483\u0486" +
"\u0490\u04c4\u04c7\u04c8\u04cb\u04cc\u04d0\u04eb" +
"\u04ee\u04f5\u04f8\u04f9\u0531\u0556\u0559\u0559" +
"\u0561\u0586\u0591\u05a1\u05a3\u05b9\u05bb\u05bd" +
"\u05bf\u05bf\u05c1\u05c2\u05c4\u05c4\u05d0\u05ea" +
"\u05f0\u05f2\u0621\u063a\u0640\u0652\u0660\u0669" +
"\u0670\u06b7\u06ba\u06be\u06c0\u06ce\u06d0\u06d3" +
"\u06d5\u06e8\u06ea\u06ed\u06f0\u06f9\u0901\u0903" +
"\u0905\u0939\u093c\u094d\u0951\u0954\u0958\u0963" +
"\u0966\u096f\u0981\u0983\u0985\u098c\u098f\u0990" +
"\u0993\u09a8\u09aa\u09b0\u09b2\u09b2\u09b6\u09b9" +
"\u09bc\u09bc\u09be\u09c4\u09c7\u09c8\u09cb\u09cd" +
"\u09d7\u09d7\u09dc\u09dd\u09df\u09e3\u09e6\u09f1" +
"\u0a02\u0a02\u0a05\u0a0a\u0a0f\u0a10\u0a13\u0a28" +
"\u0a2a\u0a30\u0a32\u0a33\u0a35\u0a36\u0a38\u0a39" +
"\u0a3c\u0a3c\u0a3e\u0a42\u0a47\u0a48\u0a4b\u0a4d" +
"\u0a59\u0a5c\u0a5e\u0a5e\u0a66\u0a74\u0a81\u0a83" +
"\u0a85\u0a8b\u0a8d\u0a8d\u0a8f\u0a91\u0a93\u0aa8" +
"\u0aaa\u0ab0\u0ab2\u0ab3\u0ab5\u0ab9\u0abc\u0ac5" +
"\u0ac7\u0ac9\u0acb\u0acd\u0ae0\u0ae0\u0ae6\u0aef" +
"\u0b01\u0b03\u0b05\u0b0c\u0b0f\u0b10\u0b13\u0b28" +
"\u0b2a\u0b30\u0b32\u0b33\u0b36\u0b39\u0b3c\u0b43" +
"\u0b47\u0b48\u0b4b\u0b4d\u0b56\u0b57\u0b5c\u0b5d" +
"\u0b5f\u0b61\u0b66\u0b6f\u0b82\u0b83\u0b85\u0b8a" +
"\u0b8e\u0b90\u0b92\u0b95\u0b99\u0b9a\u0b9c\u0b9c" +
"\u0b9e\u0b9f\u0ba3\u0ba4\u0ba8\u0baa\u0bae\u0bb5" +
"\u0bb7\u0bb9\u0bbe\u0bc2\u0bc6\u0bc8\u0bca\u0bcd" +
"\u0bd7\u0bd7\u0be7\u0bef\u0c01\u0c03\u0c05\u0c0c" +
"\u0c0e\u0c10\u0c12\u0c28\u0c2a\u0c33\u0c35\u0c39" +
"\u0c3e\u0c44\u0c46\u0c48\u0c4a\u0c4d\u0c55\u0c56" +
"\u0c60\u0c61\u0c66\u0c6f\u0c82\u0c83\u0c85\u0c8c" +
"\u0c8e\u0c90\u0c92\u0ca8\u0caa\u0cb3\u0cb5\u0cb9" +
"\u0cbe\u0cc4\u0cc6\u0cc8\u0cca\u0ccd\u0cd5\u0cd6" +
"\u0cde\u0cde\u0ce0\u0ce1\u0ce6\u0cef\u0d02\u0d03" +
"\u0d05\u0d0c\u0d0e\u0d10\u0d12\u0d28\u0d2a\u0d39" +
"\u0d3e\u0d43\u0d46\u0d48\u0d4a\u0d4d\u0d57\u0d57" +
"\u0d60\u0d61\u0d66\u0d6f\u0e01\u0e2e\u0e30\u0e3a" +
"\u0e40\u0e4e\u0e50\u0e59\u0e81\u0e82\u0e84\u0e84" +
"\u0e87\u0e88\u0e8a\u0e8a\u0e8d\u0e8d\u0e94\u0e97" +
"\u0e99\u0e9f\u0ea1\u0ea3\u0ea5\u0ea5\u0ea7\u0ea7" +
"\u0eaa\u0eab\u0ead\u0eae\u0eb0\u0eb9\u0ebb\u0ebd" +
"\u0ec0\u0ec4\u0ec6\u0ec6\u0ec8\u0ecd\u0ed0\u0ed9" +
"\u0f18\u0f19\u0f20\u0f29\u0f35\u0f35\u0f37\u0f37" +
"\u0f39\u0f39\u0f3e\u0f47\u0f49\u0f69\u0f71\u0f84" +
"\u0f86\u0f8b\u0f90\u0f95\u0f97\u0f97\u0f99\u0fad" +
"\u0fb1\u0fb7\u0fb9\u0fb9\u10a0\u10c5\u10d0\u10f6" +
"\u1100\u1100\u1102\u1103\u1105\u1107\u1109\u1109" +
"\u110b\u110c\u110e\u1112\u113c\u113c\u113e\u113e" +
"\u1140\u1140\u114c\u114c\u114e\u114e\u1150\u1150" +
"\u1154\u1155\u1159\u1159\u115f\u1161\u1163\u1163" +
"\u1165\u1165\u1167\u1167\u1169\u1169\u116d\u116e" +
"\u1172\u1173\u1175\u1175\u119e\u119e\u11a8\u11a8" +
"\u11ab\u11ab\u11ae\u11af\u11b7\u11b8\u11ba\u11ba" +
"\u11bc\u11c2\u11eb\u11eb\u11f0\u11f0\u11f9\u11f9" +
"\u1e00\u1e9b\u1ea0\u1ef9\u1f00\u1f15\u1f18\u1f1d" +
"\u1f20\u1f45\u1f48\u1f4d\u1f50\u1f57\u1f59\u1f59" +
"\u1f5b\u1f5b\u1f5d\u1f5d\u1f5f\u1f7d\u1f80\u1fb4" +
"\u1fb6\u1fbc\u1fbe\u1fbe\u1fc2\u1fc4\u1fc6\u1fcc" +
"\u1fd0\u1fd3\u1fd6\u1fdb\u1fe0\u1fec\u1ff2\u1ff4" +
"\u1ff6\u1ffc\u20d0\u20dc\u20e1\u20e1\u2126\u2126" +
"\u212a\u212b\u212e\u212e\u2180\u2182\u3005\u3005" +
"\u3007\u3007\u3021\u302f\u3031\u3035\u3041\u3094" +
"\u3099\u309a\u309d\u309e\u30a1\u30fa\u30fc\u30fe" +
"\u3105\u312c\u4e00\u9fa5\uac00\ud7a3";
#endif
As to why your problem occurs: s_NCStartName constant defines which characters can be used to start your name, and in fifth edition it is VERY short (you will not find 0xB5 part in new edition string), and there is no way around it. It is constant, hardcoded into System.Xml. Well, it is Microsoft, they don't care about backward compatability.
As to solution
You have four options:
Attach old System.Xml, which is in my opinion is worse solution, but easiest.
Preprosses your xml: clean it from invalid start name characters.
Blame Microsoft on task tracker about this: lose of backward compatability.
Use other serialization frameworks, for example Xml.Net
PS
Small advise. You can rewrite ConvertNode like this:
public static T ConvertNode<T>(XmlNode node)
{
using (var reader = new XmlNodeReader(node))
{
var ser = new XmlSerializer(typeof(T));
return (T) ser.Deserialize(reader);
}
}

Regular expression getting html between two comments

I am trying to get a snippet of HTML between to comments.
I will need to parse the HTML between the start/end later.
I am actually reading from an html file but for test purposes I mocked the following up:
string emailFeedTxtStart = "<!--FEED FOR RECEIPT GOES HERE-->";
string emailFeedTxtEnd = "<!--FEED FOR RECEIPT ENDS HERE-->";
string html =
emailFeedTxtStart + Environment.NewLine +
#"<td align=""center"">" + Environment.NewLine +
#"<table style=""table-layout:fixed;width:380px"" border=""0"" cellspacing=""0"" cellpadding=""0"">" + Environment.NewLine +
"<tbody>" + Environment.NewLine +
"<tr>" + Environment.NewLine +
"<td>" + Environment.NewLine +
"</td>" + Environment.NewLine +
"</tr>" + Environment.NewLine +
"</tbody>" + Environment.NewLine +
"</table>" + Environment.NewLine +
"</td>" + Environment.NewLine +
emailFeedTxtEnd;
string patternstart = Regex.Escape(emailFeedTxtStart);
string patternend = Regex.Escape(emailFeedTxtEnd);
string regexexpr = patternstart + #"(.*?)" + patternend;
//string regexexpr = #"(?<=" + patternstart + ")(.*?)(?=" + patternend + ")";
MatchCollection matches = Regex.Matches(#html, #regexexpr);
matches returned is 0.
(note there is a lot more HTML between the ).
Any help would be greatly appreciated.
What are you going to parse the HTML with after? Because there's probably a way you can just do away with actually manipulating the HTML string beforehand. Here's a solution anyway:
string afterFirst = html.Substring(Regex.Match(html, emailFeedTxtStart).Index + emailFeedTxtStart.Length);
string between = afterFirst.Substring(0, Regex.Match(afterFirst, emailFeedTxtEnd).Index);

Save HTML EDITOR CONTENT to Content-type .DOCX in C#

private void ConvertHTMLtoDOCX(string txtcode)
{
System.Text.StringBuilder strBody = new System.Text.StringBuilder("");
strBody.Append("<html " + "xmlns:o='urn:schemas-microsoft-com:office:office' " + "xmlns:w='urn:schemas-microsoft-com:office:word'" + "xmlns='http://www.w3.org/TR/REC-html40'>" + "<head><title>Time</title>");
//The setting specifies document's view after it is downloaded as Print
//instead of the default Web Layout
strBody.Append("<!--[if gte mso 9]>" + "<xml>" + "<w:WordDocument>" + "<w:View>Print</w:View>" + "<w:DoNotOptimizeForBrowser/>" + "</w:WordDocument>" + "</xml>" + "<![endif]-->");
strBody.Append("<style>" + "<!-- /* Style Definitions */" + "#page Section1" + " {size:8.5in 11.0in; " + " margin:1.0in 1.25in 1.0in 1.25in ; " + " mso-header-margin:.5in; " + " mso-footer-margin:.5in; mso-paper-source:0;}" + " div.Section1" + " {page:Section1;}" + "-->" + "</style></head>");
strBody.Append("<body lang=EN-US style='tab-interval:.5in'>" + "<div class=Section1>" + Html_editor.Content + "</div></body></html>");
//Force this content to be downloaded
//as a Word document with the name of your choice
string FullFilePath = #"C:\Users\ravikant\Desktop\AR GitHub\07-05-2014\FinalTestARGithub\LetterTemplate\"+ txtcode+ ".docx";
FileInfo file = new FileInfo(FullFilePath);
if (file.Exists)
{
ClientScript.RegisterStartupScript(this.GetType(), "disExp", "<script>alert('File Already Exists');</script>");
}
else
{
Response.AppendHeader("Content-Type", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
Response.AppendHeader("Content-disposition", "inline; filename="+txtcode+".docx");
Response.Write(strBody);
}
}
Here is the code using the CONTENT-TYPE for .DOCX "application/vnd.openxmlformats-officedocument.wordprocessingml.document", the Content is corrupt while opening the file.
Try this to find out what is going on with the file.
I've done this with native word .docx, but not .docx generated in this manner, so it may or may not work.
Make a copy of the saved file, change its extention from .docx to .zip.
Try and open that. We are trying to find a file document.xml, which is normally in the "word" folder.
Open that in a text editor an see if anything is jumping out as wrong or try putting it through an XML validator. VisualStudio should be good enough to show any malformating.
Online XML Validator that might help: http://www.xmlvalidation.com/
The following lines are also suspect:
strBody.Append("<!--[if gte mso 9]>" + "<xml>" + "<w:WordDocument>" + "<w:View>Print</w:View>" + "<w:DoNotOptimizeForBrowser/>" + "</w:WordDocument>" + "</xml>" + "<![endif]-->");
As I'm unsure how word will handle IE conditional comments. Comment out or remove this line and see what happens.
strBody.Append("<style>" + "<!-- /* Style Definitions */" + "#page Section1" + " {size:8.5in 11.0in; " + " margin:1.0in 1.25in 1.0in 1.25in ; " + " mso-header-margin:.5in; " + " mso-footer-margin:.5in; mso-paper-source:0;}" + " div.Section1" + " {page:Section1;}" + "-->" + "</style></head>");
Due to the nested comments. <!-- /* */-->. Perhaps try changing it to: strBody.Append("</head>"); and see if that works.

GitHub API update file

I'm trying to update a file via the GitHub API.
I've got everything setup, and in the end the actual file is updated, which is a good thing.
However, let's say I've got these 2 files in my repo
FileToBeUpdated.txt
README.md
And then I run my program
The FileToBeUpdated.txt is updated, like it should, however the README.md is deleted.
This is the code with the 5 steps to update the file:
private static void Main()
{
string shaForLatestCommit = GetSHAForLatestCommit();
string shaBaseTree = GetShaBaseTree(shaForLatestCommit);
string shaNewTree = CreateTree(shaBaseTree);
string shaNewCommit = CreateCommit(shaForLatestCommit, shaNewTree);
SetHeadPointer(shaNewCommit);
}
private static void SetHeadPointer(string shaNewCommit)
{
WebClient webClient = GetMeAFreshWebClient();
string contents = "{" +
"\"sha\":" + "\"" + shaNewCommit + "\", " +
// "\"force\":" + "\"true\"" +
"}";
// TODO validate ?
string downloadString = webClient.UploadString(Constants.Start + Constants.PathToRepo + "refs/heads/master", "PATCH", contents);
}
private static string CreateCommit(string latestCommit, string shaNewTree)
{
WebClient webClient = GetMeAFreshWebClient();
string contents = "{" +
"\"parents\" :[ \"" + latestCommit + "\" ], " +
"\"tree\":" + "\"" + shaNewTree + "\", " +
"\"message\":" + "\"test\", " +
"\"author\": { \"name\": \""+ Constants.Username +"\", \"email\": \""+ Constants.Email+"\",\"date\": \"" + DateTime.UtcNow.ToString("s", CultureInfo.InvariantCulture) + "\"}" +
"}";
string downloadString = webClient.UploadString(Constants.Start + Constants.PathToRepo + "commits", contents);
var foo = JsonConvert.DeserializeObject<CommitRootObject>(downloadString);
return foo.sha;
}
private static string CreateTree(string shaBaseTree)
{
WebClient webClient = GetMeAFreshWebClient();
string contents = "{" +
"\"tree\" :" +
"[ {" +
"\"base_tree\": " + "\"" + shaBaseTree + "\"" + "," +
"\"path\": " + "\"" + Constants.FileToChange + "\"" + " ," +
"\"content\":" + "\"" + DateTime.Now.ToLongTimeString() + "\"" +
"} ]" +
"}";
string downloadString = webClient.UploadString(Constants.Start + Constants.PathToRepo + "trees", contents);
var foo = JsonConvert.DeserializeObject<TreeRootObject>(downloadString);
return foo.sha;
}
private static string GetShaBaseTree(string latestCommit)
{
WebClient webClient = GetMeAFreshWebClient();
string downloadString = webClient.DownloadString(Constants.Start + Constants.PathToRepo + "commits/" + latestCommit);
var foo = JsonConvert.DeserializeObject<CommitRootObject>(downloadString);
return foo.tree.sha;
}
private static string GetSHAForLatestCommit()
{
WebClient webClient = GetMeAFreshWebClient();
string downloadString = webClient.DownloadString(Constants.Start + Constants.PathToRepo + "refs/heads/master");
var foo = JsonConvert.DeserializeObject<HeadRootObject>(downloadString);
return foo.#object.sha;
}
private static WebClient GetMeAFreshWebClient()
{
var webClient = new WebClient();
webClient.Headers.Add(string.Format("Authorization: token {0}", Constants.OAuthToken));
return webClient;
}
Which part am I missing? Am I using the wrong tree to start from?
It seems that in the function CreateTree I should set the base_tree at the root level, not at the level of each tree I create.
So in the function CreateTree the contents become this:
string contents = "{" +
"\"base_tree\": " + "\"" + shaBaseTree + "\"" + "," + // IT'S THIS LINE!
"\"tree\" :" +
"[ {" +
"\"path\": " + "\"" + Constants.FileToChange + "\"" + " ," +
"\"content\":" + "\"" + DateTime.Now.ToLongTimeString() + "\"" +
"} ]" +
"}";
Updated GitHub documentation: https://github.com/Snakiej/developer.github.com/commit/2c4f93003703f68c4c8a436df8cf18e615e293c7

Categories