I am having a bit of a problem with Escape characters is a string that I am reading from a txt file,
They are causing an error later in my program, they need to be removed but I can't seem to filter them out
public static List<string> loadData(string type)
{
List<string> dataList = new List<string>();
try
{
string path = Path.Combine(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location), "Data");
string text = File.ReadAllText(path + type);
string[] dataArray = text.Split(',');
foreach (var data in dataArray)
{
string dataUnescaped = Regex.Unescape(data);
if (!string.IsNullOrEmpty(dataUnescaped) && (!dataUnescaped.Contains(#"\r") || (!dataUnescaped.Contains(#"\n"))))
{
dataList.Add(data);
}
}
return dataList;
}
catch(Exception e)
{
Console.WriteLine(e);
return dataList;
}
}
I have tried text.Replace(#"\r\n")
and an if statement but I just cant seem to remove them from my string
Any ideas will be appreciated
If you add the # Sign before a string that means you specify that you want a string without having to escape any characters.
So if you wanted a path without # you would need to do this:
string s = "c:\\myfolder\\myfile.txt"
But if you add the # before your \n\r isntead of the escaped sequence Windows New Line you would instead get the string "\n\r".
So this will result in you removing all occurrences of the string "\n\r". Instead of NewLines like you want to:
text.Replace(#"\r\n")
To fix that you would need to use:
text = text.Replace(Environment.NewLine, string.Empty);
You can use Environment.NewLine as well instead of \r and \n, because Environment knows which OS you are currently on and change the replaced character depeding on that.
Related
I'm having issues doing a find / replace type of action in my function, i'm extracting the < a href="link">anchor from an article and replacing it with this format: [link anchor] the link and anchor will be dynamic so i can't hard code the values, what i have so far is:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
string theString = string.Empty;
switch (articleWikiCheck) {
case "id|wpTextbox1":
StringBuilder newHtml = new StringBuilder(articleBody);
Regex r = new Regex(#"\<a href=\""([^\""]+)\"">([^<]+)");
string final = string.Empty;
foreach (var match in r.Matches(theString).Cast<Match>().OrderByDescending(m => m.Index))
{
string text = match.Groups[2].Value;
string newHref = "[" + match.Groups[1].Index + " " + match.Groups[1].Index + "]";
newHtml.Remove(match.Groups[1].Index, match.Groups[1].Length);
newHtml.Insert(match.Groups[1].Index, newHref);
}
theString = newHtml.ToString();
break;
default:
theString = articleBody;
break;
}
Helpers.ReturnMessage(theString);
return theString;
}
Currently, it just returns the article as it originally is, with the traditional anchor text format: < a href="link">anchor
Can anyone see what i have done wrong?
regards
If your input is HTML, you should consider using a corresponding parser, HtmlAgilityPack being really helpful.
As for the current code, it looks too verbose. You may use a single Regex.Replace to perform the search and replace in one pass:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody, #"<a\s+href=""([^""]+)"">([^<]+)", "[$1 $2]");
}
else
{
// Helpers.ReturnMessage(articleBody); // Uncomment if it is necessary
return articleBody;
}
}
See the regex demo.
The <a\s+href="([^"]+)">([^<]+) regex matches <a, 1 or more whitespaces, href=", then captures into Group 1 any one or more chars other than ", then matches "> and then captures into Group 2 any one or more chars other than <.
The [$1 $2] replacement replaces the matched text with [, Group 1 contents, space, Group 2 contents and a ].
Updated (Corrected regex to support whitespaces and new lines)
You can try this expression
Regex r = new Regex(#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>");
It will match your anchors, even if they are splitted into multiple lines. The reason why it is so long is because it supports empty whitespaces between the tags and their values, and C# does not supports subroutines, so this part [\s\n]* has to be repeated multiple times.
You can see a working sample at dotnetfiddle
You can use it in your example like this.
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody,
#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>",
"[${link} ${anchor}]");
}
else
{
return articleBody;
}
}
I have a really bizarre problem with trim method. I'm trying to trim a string received from database. Here's my current method:
string debug = row["PLC_ADDR1_RESULT"].ToString();
SPCFileLog.WriteToLog(String.Format("Debug: ${0}${1}",debug,Environment.NewLine));
debug = debug.Trim();
SPCFileLog.WriteToLog(String.Format("Debug2: ${0}${1}", debug, Environment.NewLine));
debug = debug.Replace(" ", "");
SPCFileLog.WriteToLog(String.Format("Debug3: ${0}${1}", debug, Environment.NewLine));
Which produces file output as following:
Debug: $ $
Debug2: $ $
Debug3: $ $
Examining the hex codes in file revealed something interesting. The supposedly empty spaces aren't hex 20 (whitespace), but they are set as 00 (null?)
How our database contains such data is another mystery, but regardless, I need to trim those invalid (?) null characters. How can I do this?
If you just want to remove all null characters from a string, try this:
debug = debug.Replace("\0", string.Empty);
If you only want to remove them from the ends of the string:
debug = debug.Trim('\0');
There's nothing special about null characters, but they aren't considered white space.
String.Trim() just doesn't consider the NUL character (\0) to be whitespace. Ultimately, it calls this function to determine whitespace, which doesn't treat it as such.
Frankly, I think that makes sense. Typically \0 is not whitespace.
#Will Vousden got me on the right track...
https://stackoverflow.com/a/32624301/12157575
--but instead of trying to rewrite or remove the line, I filtered out lines before hitting the StreamReader / StreamWriter that start with the control character in the linq statement:
string ctrlChar = "\0"; // "NUL" in notepad++
// linq statement: "where"
!line.StartsWith(ctrlChar)
// could also easily do "Contains" instead of "StartsWith"
for more context:
internal class Program
{
private static void Main(string[] args)
{
// dbl space writelines
Out.NewLine = "\r\n\r\n";
WriteLine("Starting Parse Mode...");
string inputFilePath = #"C:\_logs\_input";
string outputFilePath = #"C:\_logs\_output\";
string ouputFileName = #"consolidated_logs.txt";
// chars starting lines we don't want to parse
string hashtag = "#"; // logs notes
string whtSpace = " "; // white space char
string ctrlChar = "\0"; // "NUL" in notepad++
try
{
var files =
from file in Directory.EnumerateFiles(inputFilePath, "*.log", SearchOption.TopDirectoryOnly)
from line in File.ReadLines(file)
where !line.StartsWith(hashtag) &&
!line.StartsWith(whtSpace) &&
line != null &&
!string.IsNullOrWhiteSpace(line) &&
!line.StartsWith(ctrlChar) // CTRL CHAR FILTER
select new
{
File = file,
Line = line
};
using (StreamWriter writer = new StreamWriter(outputFilePath + ouputFileName, true))
{
foreach (var f in files)
{
writer.WriteLine($"{f.File},{f.Line}");
WriteLine($"{f.File},{f.Line}"); // see console
}
WriteLine($"{files.Count()} lines found.");
ReadLine(); // keep console open
}
}
catch (UnauthorizedAccessException uAEx)
{
Console.WriteLine(uAEx.Message);
}
catch (PathTooLongException pathEx)
{
Console.WriteLine(pathEx.Message);
}
}
}
I have been trying real hard understanding regular expression, Is there any way I can replace character(s) that is between two regex/ For example I have
string datax = "a4726e1e-babb-4898-a5d5-e29d2bc40028;POPULATE DATA AØ99c1d133-15f5-4ef5-bc59- d9ed673b70c6;POPULATE DATA BØ";
how to remove string between regex ";" and "Ø" ???
i try to use code like this :
string xresult = Regex.Replace(datax, #"(?<=;)(\w+?)(?=Ø)", "");
But not working.
please corrected and give me solutions...
thanks...
i want the result like this sir :
string datax = "a4726e1e-babb-4898-a5d5-e29d2bc40028;Ø99c1d133-15f5-4ef5-bc59-d9ed673b70c6;Ø";
I think you need to understand regex a little better and how the replace function works. with regex you're defining capture groups, and with the replace function you want to replace those groups.
how to remove string between regex ";" and "Ø" ???
Step 1: First find ";",then capture all characters up to and including "Ø".
That's (;.*?Ø)
( New Capture Group
; Match ";"
. Match Anything
* Zero or more times
? Be Lazy
Ø Match "Ø"
) End Capture
Step 2: Replace each group with ";Ø"
public static string Replace(string input, string pattern, string
replacement)
So you need to put back the ";Ø" you removed from the original capture.
static void Test2()
{
foreach (string item in SO2588078())
{
Console.WriteLine(item);
}
string input = "a4726e1e-babb-4898-a5d5-e29d2bc40028;POPULATE DATA AØ99c1d133-15f5-4ef5-bc59- d9ed673b70c6;POPULATE DATA BØ";
string regex = "(;.*?Ø)";
string output = Regex.Replace(input, regex, ";Ø");
if (output == string.Join(";Ø", SO2588078()) + ";Ø")
{
Console.WriteLine("TRUE");
}
}
An alternative would be to parse the string without regex. It's a simple format and this gives you more control over the process so you can see what's happening, why it's gone wrong and why it gives the results it does. Since you can step through it.
private static IEnumerable<string> SO2588078()
{
string datax = "a4726e1e-babb-4898-a5d5-e29d2bc40028;POPULATE DATA AØ99c1d133-15f5-4ef5-bc59- d9ed673b70c6;POPULATE DATA BØ";
string temp = datax;
while (!string.IsNullOrEmpty(temp))
{
int index1 = temp.IndexOf(';');
if (index1 > -1)
{
string guid = temp.Remove(index1);
yield return guid;
int index2 = temp.IndexOf('Ø');
if (index2 > -1)
{
temp = temp.Substring(index2 + 1);
}
else
{
temp = null;
}
}
else
{
temp = null;
}
}
}
I have a text file with a certain format. First comes an identifier followed by three spaces and a colon. Then comes the value for this identifier.
ID1 :Value1
ID2 :Value2
ID3 :Value3
What I need to do is searching e.g. for ID2 : and replace Value2 with a new value NewValue2. What would be a way to do this? The files I need to parse won't get very large. The largest will be around 150 lines.
If the file isn't that big you can do a File.ReadAllLines to get a collection of all the lines and then replace the line you're looking for like this
using System.IO;
using System.Linq;
using System.Collections.Generic;
List<string> lines = new List<string>(File.ReadAllLines("file"));
int lineIndex = lines.FindIndex(line => line.StartsWith("ID2 :"));
if (lineIndex != -1)
{
lines[lineIndex] = "ID2 :NewValue2";
File.WriteAllLines("file", lines);
}
Here's a simple solution which also creates a backup of the source file automatically.
The replacements are stored in a Dictionary object. They are keyed on the line's ID, e.g. 'ID2' and the value is the string replacement required. Just use Add() to add more as required.
StreamWriter writer = null;
Dictionary<string, string> replacements = new Dictionary<string, string>();
replacements.Add("ID2", "NewValue2");
// ... further replacement entries ...
using (writer = File.CreateText("output.txt"))
{
foreach (string line in File.ReadLines("input.txt"))
{
bool replacementMade = false;
foreach (var replacement in replacements)
{
if (line.StartsWith(replacement.Key))
{
writer.WriteLine(string.Format("{0} :{1}",
replacement.Key, replacement.Value));
replacementMade = true;
break;
}
}
if (!replacementMade)
{
writer.WriteLine(line);
}
}
}
File.Replace("output.txt", "input.txt", "input.bak");
You'll just have to replace input.txt, output.txt and input.bak with the paths to your source, destination and backup files.
Ordinarily, for any text searching and replacement, I'd suggest some sort of regular expression work, but if this is all you're doing, that's really overkill.
I would just open the original file and a temporary file; read the original a line at a time, and just check each line for "ID2 :"; if you find it, write your replacement string to the temporary file, otherwise, just write what you read. When you've run out of source, close both, delete the original, and rename the temporary file to that of the original.
Something like this should work. It's very simple, not the most efficient thing, but for small files, it would be just fine:
private void setValue(string filePath, string key, string value)
{
string[] lines= File.ReadAllLines(filePath);
for(int x = 0; x < lines.Length; x++)
{
string[] fields = lines[x].Split(':');
if (fields[0].TrimEnd() == key)
{
lines[x] = fields[0] + ':' + value;
File.WriteAllLines(lines);
break;
}
}
}
You can use regex and do it in 3 lines of code
string text = File.ReadAllText("sourcefile.txt");
text = Regex.Replace(text, #"(?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$)", "NewValue2",
RegexOptions.Multiline);
File.WriteAllText("outputfile.txt", text);
In the regex, (?i)(?<=^id2\s*?:\s*?)\w*?(?=\s*?$) means, find anything that starts with id2 with any number of spaces before and after :, and replace the following string (any alpha numeric character, excluding punctuations) all the way 'till end of the line. If you want to include punctuations, then replace \w*? with .*?
You can use regexes to achieve this.
Regex re = new Regex(#"^ID\d+ :Value(\d+)\s*$", RegexOptions.IgnoreCase | RegexOptions.Compiled);
List<string> lines = File.ReadAllLines("mytextfile");
foreach (string line in lines) {
string replaced = re.Replace(target, processMatch);
//Now do what you going to do with the value
}
string processMatch(Match m)
{
var number = m.Groups[1];
return String.Format("ID{0} :NewValue{0}", number);
}
I want to replace a charecter in a string with a string in c#.
I have tried the following,
Here in the following program, i want replace set of charecters between charecters ':' and first occurance of '-' with some others charecters.
I could able to extract the set of charecters between ':' and first occurance of '-'.
Can any one say how to insert these back in the source string.
string source= "tcm:7-426-8";
string target= "tcm:10-15-2";
int fistunderscore = target.IndexOf("-");
string temp = target.Substring(4, fistunderscore-4);
Response.Write("<BR>"+"temp1:" + temp + "<BR>");
Examples:
source: "tcm:7-426-8" or "tcm:100-426-8" or "tcm:10-426-8"
Target: "tcm:10-15-2" or "tcm:5-15-2" or "tcm:100-15-2"
output: "tcm:10-426-8" or "tcm:5-426-8" or "tcm:100-426-8"
In a nutshell, I want to replace the set of charectes between ':' and '-'(firstoccurance) and the charecters extracetd from the same sort of string.
Can any help how it can be done.
Thank you.
If you want to replace the first ":Number-" from the source with the content from target, you can use the following regex.
var pattern1 = New Regex(":\d{1,3}-{1}");
if(pattern1.IsMatch(source) && pattern1.IsMatch(target))
{
var source = "tcm:7-426-8";
var target = "tcm:10-15-2";
var res = pattern1.Replace(source, pattern1.Match(target).Value);
// "tcm:10-426-8"
}
Edit: To not have your string replaced with something empty, add an if-clause before the actualy replacing.
Try a regex solution - first this method, takes the source and target strings, and performs a regex replace on the first, targetting the first numbers after the 'tcm', which must be anchored to the start of the string. In the MatchEvaluator it executes the same regex again, but on the target string.
static Regex rx = new Regex("(?<=^tcm:)[0-9]+", RegexOptions.Compiled);
public string ReplaceOneWith(string source, string target)
{
return rx.Replace(source, new MatchEvaluator((Match m) =>
{
var targetMatch = rx.Match(target);
if (targetMatch.Success)
return targetMatch.Value;
return m.Value; //don't replace if no match
}));
}
Note that no replacement is performed if the regex doesn't return a match on the target string.
Now run this test (probably need to copy the above into the test class):
[TestMethod]
public void SO9973554()
{
Assert.AreEqual("tcm:10-426-8", ReplaceOneWith("tcm:7-426-8", "tcm:10-15-2"));
Assert.AreEqual("tcm:5-426-8", ReplaceOneWith("tcm:100-426-8", "tcm:5-15-2"));
Assert.AreEqual("tcm:100-426-8", ReplaceOneWith("tcm:10-426-8", "tcm:100-15-2"));
}
I'm not clear on the logic used to decide which bit from which string is used, but still, you should use Split(), rather than mucking about with string offsets:
(note that the Remove(0,4) is there to remove the tcm: prefix)
string[] source = "tcm:90-2-10".Remove(0,4).Split('-');
string[] target = "tcm:42-23-17".Remove(0,4).Split('-');
Now you have the numbers from both source and target in easy-to-access arrays, so you can build the new string any way you want:
string output = string.Format("tcm:{0}-{1}-{2}", source[0], target[1], source[2]);
Heres without regex
string source = "tcm:7-426-8";
string target = "tcm:10-15-2";
int targetBeginning = target.IndexOf("-");
int sourceBeginning = source.IndexOf("-");
string temp = target.Substring(0, targetBeginning);//tcm:10
string result = temp + source.Substring(sourceBeginning, source.Length-sourceBeginning); //tcm:10 + -426-8