Is it possible to have ';' not delimiting comments only when it is on the right hand side of an equal side?
I have the following INI file:
; some comment that should be ignored
[section]
key=value1;value2;value2
With the following code, Nini is removing value2;value3 as it is regarded as a comment:
using (TextReader tr = new StreamReader(iniFile))
{
IniDocument doc = new IniDocument(tr);
foreach (DictionaryEntry entry in doc.Sections)
{
string key = (string)entry.Key;
IniSection section = (IniSection)entry.Value;
if (section.Contains("key"))
{
// ... do stuff
}
}
}
Of course, I can do something like
IniReader ir = new IniReader(tr);
ir.SetCommentDelimiters(new char[] { '!' });
IniDocument doc = new IniDocument(ir);
but then also the initial comment will be treated as a config file and result into an error ("expecting =").
A quick scan of the code shows some useful properties you might be able to use:
result.AcceptCommentAfterKey = false;
result.SetCommentDelimiters (new char[] { ';', '#' });
Maybe setting that AcceptCommentAfterKey to false will help you? Otherwise, you could override the comment delimiter and replace the symbol you use for delimiting comments with whatever you want.
Related
I have a text file with a certain format. First comes an identifier followed by three spaces and a colon. Then comes the value for this identifier.
ID1 :Value1
ID2 :Value2
ID3 :Value3
What I need to do is searching e.g. for ID2 : and replace Value2 with a new value NewValue2. What would be a way to do this? The files I need to parse won't get very large. The largest will be around 150 lines.
If the file isn't that big you can do a File.ReadAllLines to get a collection of all the lines and then replace the line you're looking for like this
using System.IO;
using System.Linq;
using System.Collections.Generic;
List<string> lines = new List<string>(File.ReadAllLines("file"));
int lineIndex = lines.FindIndex(line => line.StartsWith("ID2 :"));
if (lineIndex != -1)
{
lines[lineIndex] = "ID2 :NewValue2";
File.WriteAllLines("file", lines);
}
Here's a simple solution which also creates a backup of the source file automatically.
The replacements are stored in a Dictionary object. They are keyed on the line's ID, e.g. 'ID2' and the value is the string replacement required. Just use Add() to add more as required.
StreamWriter writer = null;
Dictionary<string, string> replacements = new Dictionary<string, string>();
replacements.Add("ID2", "NewValue2");
// ... further replacement entries ...
using (writer = File.CreateText("output.txt"))
{
foreach (string line in File.ReadLines("input.txt"))
{
bool replacementMade = false;
foreach (var replacement in replacements)
{
if (line.StartsWith(replacement.Key))
{
writer.WriteLine(string.Format("{0} :{1}",
replacement.Key, replacement.Value));
replacementMade = true;
break;
}
}
if (!replacementMade)
{
writer.WriteLine(line);
}
}
}
File.Replace("output.txt", "input.txt", "input.bak");
You'll just have to replace input.txt, output.txt and input.bak with the paths to your source, destination and backup files.
Ordinarily, for any text searching and replacement, I'd suggest some sort of regular expression work, but if this is all you're doing, that's really overkill.
I would just open the original file and a temporary file; read the original a line at a time, and just check each line for "ID2 :"; if you find it, write your replacement string to the temporary file, otherwise, just write what you read. When you've run out of source, close both, delete the original, and rename the temporary file to that of the original.
Something like this should work. It's very simple, not the most efficient thing, but for small files, it would be just fine:
private void setValue(string filePath, string key, string value)
{
string[] lines= File.ReadAllLines(filePath);
for(int x = 0; x < lines.Length; x++)
{
string[] fields = lines[x].Split(':');
if (fields[0].TrimEnd() == key)
{
lines[x] = fields[0] + ':' + value;
File.WriteAllLines(lines);
break;
}
}
}
You can use regex and do it in 3 lines of code
string text = File.ReadAllText("sourcefile.txt");
text = Regex.Replace(text, #"(?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$)", "NewValue2",
RegexOptions.Multiline);
File.WriteAllText("outputfile.txt", text);
In the regex, (?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$) means, find anything that starts with id2 with any number of spaces before and after :, and replace the following string (any alpha numeric character, excluding punctuations) all the way 'till end of the line. If you want to include punctuations, then replace \w*? with .*?
You can use regexes to achieve this.
Regex re = new Regex(#"^ID\d+ :Value(\d+)\s*$", RegexOptions.IgnoreCase | RegexOptions.Compiled);
List<string> lines = File.ReadAllLines("mytextfile");
foreach (string line in lines) {
string replaced = re.Replace(target, processMatch);
//Now do what you going to do with the value
}
string processMatch(Match m)
{
var number = m.Groups[1];
return String.Format("ID{0} :NewValue{0}", number);
}
.NET's XmlTextWriter creates invalid xml files.
In XML, some control characters are allowed, like 'horizontal tab' ( ), but others are not, like 'vertical tab' (). (See spec.)
I have a string which contains a UTF-8 control character that is not allowed in XML.
Although XmlTextWriter escapes the character, the resulting XML is ofcourse still invalid.
How can I make sure that XmlTextWriter never produces an illegal XML file?
Or, if it's not possible to do this with XmlTextWriter, how can I strip the specific control characters that aren't allowed in XML from a string?
Example code:
using (XmlTextWriter writer =
new XmlTextWriter("test.xml", Encoding.UTF8))
{
writer.WriteStartDocument();
writer.WriteStartElement("Test");
writer.WriteValue("hello \xb world");
writer.WriteEndElement();
writer.WriteEndDocument();
}
Output:
<?xml version="1.0" encoding="utf-8"?><Test>hello world</Test>
This documentation of a behaviour is hidden in the documentation of the WriteString method but it sounds like it applies to the whole class.
The default behavior of an XmlWriter created using Create is to throw
an ArgumentException when attempting to write character values in the
range 0x-0x1F (excluding white space characters 0x9, 0xA, and 0xD).
These invalid XML characters can be written by creating the XmlWriter
with the CheckCharacters property set to false. Doing so will result
in the characters being replaced with numeric character entities (
through �x1F). Additionally, an XmlTextWriter created with the new
operator will replace the invalid characters with numeric character
entities by default.
So it seems that you end up writing invalid characters because you are using the XmlTextWriter class. A better solution for you would be to use the XmlWriter Class instead.
Just found this question when I was struggling with the same issue and I ended up solving it with an regex:
return Regex.Replace(s, #"[\u0000-\u0008\u000B\u000C\u000E-\u001F]", "");
Hope it helps someone as an alternative solution.
Built in .NET escapers such as SecurityElement.Escape don't properly escape/strip it either.
You could set CheckCharacters to false on both the writer and the reader if your application is the only one interacting with the file. The resulting XML file would still be technically invalid though.
See:
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = new UTF8Encoding(false);
xmlWriterSettings.CheckCharacters = false;
var sb = new StringBuilder();
var w = XmlWriter.Create(sb, xmlWriterSettings);
w.WriteStartDocument();
w.WriteStartElement("Test");
w.WriteString("hello \xb world");
w.WriteEndElement();
w.WriteEndDocument();
w.Close();
var xml = sb.ToString();
If setting CheckCharacters to true(which it is by default) is a bit too strict since it will simply throw an exception an alternative approach that's more lenient to invalid XML characters would be to just strip them:
Googling a bit yielded the whitelist XmlTextEncoder however it'll also remove DEL and others in the range U+007F–U+0084, U+0086–U+009F that according to Valid XML Characters on wikipedia are only valid in certain contexts and which the RFC mentions as discouraged but still valid characters.
public static class XmlTextExtentions
{
private static readonly Dictionary<char, string> textEntities = new Dictionary<char, string> {
{ '&', "&"}, { '<', "<" }, { '>', ">" },
{ '"', """ }, { '\'', "'" }
};
public static string ToValidXmlString(this string str)
{
var stripped = str
.Select((c,i) => new
{
c1 = c,
c2 = i + 1 < str.Length ? str[i+1]: default(char),
v = XmlConvert.IsXmlChar(c),
p = i + 1 < str.Length ? XmlConvert.IsXmlSurrogatePair(str[i + 1], c) : false,
pp = i > 0 ? XmlConvert.IsXmlSurrogatePair(c, str[i - 1]) : false
})
.Aggregate("", (s, c) => {
if (c.pp)
return s;
if (textEntities.ContainsKey(c.c1))
s += textEntities[c.c1];
else if (c.v)
s += c.c1.ToString();
else if (c.p)
s += c.c1.ToString() + c.c2.ToString();
return s;
});
return stripped;
}
}
This passes all the XmlTextEncoder tests except for the one that expects it to strip DEL which XmlConvert.IsXmlChar, Wikipedia, and the spec marks as a valid (although discouraged) character.
I have got a problem with my exercise. The input data is a set of sentences - string[] sentences - The exercise's requirement is that how to find and replace emoticon (ex: :D) to according smiley image in each sentences, and then export them to .html file.
File text data define emoticon and smiley has a structure like that:
[imagename] tab [emoticon1] space [emoticon2] space [emoticon2]
smile.gif :) :-) :=) (smile)
sadsmile.gif :( :-( :=( (sad)
laugh.gif :D :-D (laugh)
...
The first issue is which C#'s data structure to store emoticon and smiley.
I'm happy :). How are you? -> I'm happy <img src="smile"> How are you?
The second issue is how I code to search and replace emoticon.
the last issue is, because the export file is html format, so we must encode html, may be we use HttpUtility.HtmlEncode(...) But the resultSentence contain <img ...> tag, so I think it invole to the sencond issue...
Please help me to solve those above problem. Thanks so much!
First, you need to load the smiley "mappings" into a dictionary:
Dictionary<string, string> LoadSmileys(string fileName)
{
var smileys = new Dictionary<string, string>();
using (var reader = new StreamReader(fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(new[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 1; i < parts.Length; i++)
{
smileys[parts[i]] = parts[0];
}
}
}
return smileys;
}
Then, just loop over the keys, and replace each occurrence of the key with the corresponding image. To avoid the problem mentionned in your comment to Carra's answer, just replace the longest keys first:
StringBuilder tmp = new StringBuilder(originalText);
foreach (var key in smileys.Keys.OrderByDescending(s => s.Length))
{
tmp.Replace(key, GetImageLink(smileys[key]));
}
Note the use of a StringBuilder, to avoid creating many instances of String.
It's obviously not the most efficient approach, but at least it's simple... you can always try to optimize it later if it turns out to be a performance bottleneck.
UPDATE
OK, so there is still a problem if some of your smileys include reserved HTML characters like '<' or '>'... If you encode the text to HTML before replacing the smileys, these characters will be replaced with < or >, so the smileys won't be recognized. On the other hand, if you encode the text after replacing the smileys with <img> tags, the tags will be encoded as well.
Here's what you could do:
assign a unique identifier to each smiley, something unlikely to appear in the original text, like a GUID
replace each occurrence of each smiley by the corresponding identifier (again, starting with the longest smiley)
encode the resulting text to HTML
replace each occurrence of each smiley identifier by the appropriate <img> tag
var mapping = LoadSmileys(#"D:\tmp\smileys.txt");
var smileys = mapping.Keys.OrderByDescending(s => s.Length)
.ToArray();
// Assign an ID like "{93e8b75a-6837-43f8-95ec-801ed59bc167}" to each smiley
var ids = smileys.Select(key => Guid.NewGuid().ToString("B"))
.ToArray();
string text = File.ReadAllText(#"D:\tmp\test_smileys.txt");
// Replace each smiley with its id
StringBuilder tmp = new StringBuilder(text);
for (int i = 0; i < smileys.Length; i++)
{
tmp.Replace(smileys[i], ids[i]);
}
// Encode the text to HTML
text = HttpUtility.HtmlEncode(tmp.ToString());
// Replace each id with the appropriate <img> tag
tmp = new StringBuilder(text);
for (int i = 0; i < smileys.Length; i++)
{
string image = mapping[smileys[i]];
tmp.Replace(ids[i], GetImageLink(image));
}
text = tmp.ToString();
You can use simple string.replace here.
foreach(string text in sentences)
{
foreach(var kvp in dict)
{
text = text.replace(kvp.Key, GetImageLink(kvp.Value));
}
}
To create the html you're better of using the native C# classes like HtmlTextWriter or an XmlWriter.
I have a C# asp.net page that has to get username/password info from a text file.
Could someone please tell me how.
The text file looks as follows: (it is actually a lot larger, I just got a few lines)
DATASOURCEFILE=D:\folder\folder
var1= etc
var2= more
var3 = misc
var4 = stuff
USERID = user1
PASSWORD = pwd1
all I need is the UserID and password out of that file.
Thank you for your help,
Steve
This would work:
var dic = File.ReadAllLines("test.txt")
.Select(l => l.Split(new[] { '=' }))
.ToDictionary( s => s[0].Trim(), s => s[1].Trim());
dic is a dictionary, so you easily extract your values, i.e.:
string myUser = dic["USERID"];
string myPassword = dic["PASSWORD"];
Open the file, split on the newline, split again on the = for each item and then add it to a dictionary.
string contents = String.Empty;
using (FileStream fs = File.Open("path", FileMode.OpenRead))
using (StreamReader reader = new StreamReader(fs))
{
contents = reader.ReadToEnd();
}
if (contents.Length > 0)
{
string[] lines = contents.Split(new char[] { '\n' });
Dictionary<string, string> mysettings = new Dictionary<string, string>();
foreach (string line in lines)
{
string[] keyAndValue = line.Split(new char[] { '=' });
mysettings.Add(keyAndValue[0].Trim(), keyAndValue[1].Trim());
}
string test = mysettings["USERID"]; // example of getting userid
}
You can use Regular expressions to extract each variable. You can read one line at a time, or the entire file into one string. If the latter, you just look for a newline in the expression.
Regards,
Morten
Dictionary is not needed.
Old-fashioned parsing can do more, with less executable code, the same amount of compiled data, and less processing:
public string MyPath1;
public string MyPath2;
...
public void ReadConfig(string sConfigFile)
{
MyPath1 = MyPath2 = ""; // Clear the external values (in case the file does not set every parameter).
using (StreamReader sr = new StreamReader(sConfigFile)) // Open the file for reading (and auto-close).
{
while (!sr.EndOfStream)
{
string sLine = sr.ReadLine().Trim(); // Read the next line. Trim leading and trailing whitespace.
// Treat lines with NO "=" as comments (ignore; no syntax checking).
// Treat lines with "=" as the first character as comments too.
// Treat lines with "=" as the 2nd character or after as parameter lines.
// Side-benefit: Values containing "=" are processed correctly.
int i = sLine.IndexOf("="); // Find the first "=" in the line.
if (i <= 0) // IF the first "=" in the line is the first character (or not present),
continue; // the line is not a parameter line. Ignore it. (Iterate the while.)
string sParameter = sLine.Remove(i).TrimEnd(); // All before the "=" is the parameter name. Trim whitespace.
string sValue = sLine.Substring(i + 1).TrimStart(); // All after the "=" is the value. Trim whitespace.
// Extra characters before a parameter name are usually intended to comment it out. Here, we keep them (with or without whitespace between). That makes an unrecognized parameter name, which is ignored (acts as a comment, as intended).
// Extra characters after a value are usually intended as comments. Here, we trim them only if whitespace separates. (Parsing contiguous comments is too complex: need delimiter(s) and then a way to escape delimiters (when needed) within values.) Side-drawback: Values cannot contain " ".
i = sValue.IndexOfAny(new char[] {' ', '\t'}); // Find the first " " or tab in the value.
if (i > 1) // IF the first " " or tab is the second character or after,
sValue = sValue.Remove(i); // All before the " " or tab is the parameter. (Discard the rest.)
// IF a desired parameter is specified, collect it:
// (Could detect here if any parameter is set more than once.)
if (sParameter == "MyPathOne")
MyPath1 = sValue;
else if (sParameter == "MyPathTwo")
MyPath2 = sValue;
// (Could detect here if an invalid parameter name is specified.)
// (Could exit the loop here if every parameter has been set.)
} // end while
// (Could detect here if the config file set neither parameter or only one parameter.)
} // end using
}
Using C#, how would you go about converting a String which also contains newline characters and tabs (4 spaces) from the following format
A {
B {
C = D
E = F
}
G = H
}
into the following
A.B.C = D
A.B.E = F
A.G = H
Note that A to H are just place holders for String values which will not contain '{', '}', and '=' characters. The above is just an example and the actual String to convert can contain nesting of values which is infinitely deep and can also contain an infinite number of "? = ?".
You probably want to parse this, and then generate the desired format. Trying to do regex tranforms isn't going to get you anywhere.
Tokenize the string, then go through the tokens and build up a syntax tree. Then walk the tree generating the output.
Alternative, push each "namespace" onto a stack as you encounter it, and pop it off when you encounter the close brace.
Not very pretty, but here's an implementation that uses a stack:
static string Rewrite(string input)
{
var builder = new StringBuilder();
var stack = new Stack<string>();
string[] lines = input.Split('\n');
foreach (var s in lines)
{
if (s.Contains("{") || s.Contains("="))
{
stack.Push(s.Replace("{", String.Empty).Trim());
}
if (s.Contains("="))
{
builder.Append(string.Join(".", stack.Reverse().ToArray()));
builder.Append(Environment.NewLine);
}
if (s.Contains("}") || s.Contains("="))
{
stack.Pop();
}
}
return builder.ToString();
}
Pseudocode for the stack method:
function do_processing(Stack stack)
add this namespace to the stack;
for each sub namespace of the current namespace
do_processing(sub namespace)
end
for each variable declaration in the current namespace
make_variable_declaration(stack, variable declaration)
end
end
You can do this with regular expressions, it's just not the most efficient way to do it as you need to scan the string multiple times.
while (s.Contains("{")) {
s = Regex.Replace(s, #"([^\s{}]+)\s*\{([^{}]+)\}", match => {
return Regex.Replace(match.Groups[2].Value,
#"\s*(.*\n)",
match.Groups[1].Value + ".$1");
});
}
Result:
A.B.C = D
A.B.E = F
A.G = H
I still think using a parser and/or stack based approach is the best way to do this, but I just thought I'd offer an alternative.