Multiple string replace in c# - c#

I am dynamically editing a regex for matching text in a pdf, which can contain hyphenation at the end of some lines.
Example:
Source string:
"consecuti?vely"
Replace rules:
.Replace("cuti?",#"cuti?(-\s+)?")
.Replace("con",#"con(-\s+)?")
.Replace("consecu",#"consecu(-\s+)?")
Desired output:
"con(-\s+)?secu(-\s+)?ti?(-\s+)?vely"
The replace rules are built dynamically, this is just an example which causes problems.
Whats the best solution to perform such a multiple replace, which will produce the desired output?
So far I thought about using Regex.Replace and zipping the word to replace with optional (-\s+)? between each character, but that would not work, because the word to replace already contains special-meaning characters in regex context.
EDIT: My current code, doesnt work when replace rules overlap like in example above
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
for (int i = 0; i < hyphenatedParts.Count; i++)
{
var partBeforeHyphen = String.Concat(hyphenatedParts[i].Value.TakeWhile(c => c != '-'));
regex = regex.Replace(partBeforeHyphen, partBeforeHyphen + #"(-\s+)?");
}
return regex;
}

the output of this program is "con(-\s+)?secu(-\s+)?ti?(-\s+)?vely";
and as I understand your problem, my code can completely solve your problem.
class Program
{
class somefields
{
public string first;
public string secound;
public string Add;
public int index;
public somefields(string F, string S)
{
first = F;
secound = S;
}
}
static void Main(string[] args)
{
//declaring output
string input = "consecuti?vely";
List<somefields> rules=new List<somefields>();
//declaring rules
rules.Add(new somefields("cuti?",#"cuti?(-\s+)?"));
rules.Add(new somefields("con",#"con(-\s+)?"));
rules.Add(new somefields("consecu",#"consecu(-\s+)?"));
// finding the string which must be added to output string and index of that
foreach (var rul in rules)
{
var index=input.IndexOf(rul.first);
if (index != -1)
{
var add = rul.secound.Remove(0,rul.first.Count());
rul.Add = add;
rul.index = index+rul.first.Count();
}
}
// sort rules by index
for (int i = 0; i < rules.Count(); i++)
{
for (int j = i + 1; j < rules.Count(); j++)
{
if (rules[i].index > rules[j].index)
{
somefields temp;
temp = rules[i];
rules[i] = rules[j];
rules[j] = temp;
}
}
}
string output = input.ToString();
int k=0;
foreach(var rul in rules)
{
if (rul.index != -1)
{
output = output.Insert(k + rul.index, rul.Add);
k += rul.Add.Length;
}
}
System.Console.WriteLine(output);
System.Console.ReadLine();
}
}

You should probably write your own parser, it's probably easier to maintain :).
Maybe you could add "special characters" around pattern in order to protect them like "##" if the strings not contains it.

Try this one:
var final = Regex.Replace(originalTextOfThePage, #"(\w+)(?:\-[\s\r\n]*)?", "$1");

I had to give up an easy solution and did the editing of the regex myself. As a side effect, the new approach goes only twice trough the string.
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var indexesToInsertPossibleHyphenation = GetPossibleHyphenPositions(regex, searchedPage);
var hyphenationToken = #"(-\s+)?";
return InsertStringTokenInAllPositions(regex, indexesToInsertPossibleHyphenation, hyphenationToken);
}
private static string InsertStringTokenInAllPositions(string sourceString, List<int> insertionIndexes, string insertionToken)
{
if (insertionIndexes == null || string.IsNullOrEmpty(insertionToken)) return sourceString;
var sb = new StringBuilder(sourceString.Length + insertionIndexes.Count * insertionToken.Length);
var linkedInsertionPositions = new LinkedList<int>(insertionIndexes.Distinct().OrderBy(x => x));
for (int i = 0; i < sourceString.Length; i++)
{
if (!linkedInsertionPositions.Any())
{
sb.Append(sourceString.Substring(i));
break;
}
if (i == linkedInsertionPositions.First.Value)
{
sb.Append(insertionToken);
}
if (i >= linkedInsertionPositions.First.Value)
{
linkedInsertionPositions.RemoveFirst();
}
sb.Append(sourceString[i]);
}
return sb.ToString();
}
private List<int> GetPossibleHyphenPositions(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
var indexesToInsertPossibleHyphenation = new List<int>();
//....
// Aho-Corasick to find all occurences of all
//strings in "hyphenatedParts" in the "regex" string
// ....
return indexesToInsertPossibleHyphenation;
}

Related

C# Extract json object from mixed data text/js file

I need to parse reactjs file in main.451e57c9.js to retrieve version number with C#.
This file contains mixed data, here is little part of it:
.....inally{if(s)throw i}}return a}}(e,t)||xe(e,t)||we()}var Se=
JSON.parse('{"shortVersion":"v3.1.56"}')
,Ne="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgA
AASAAAAAqCAYAAAATb4ZSAAAACXBIWXMAAAsTAAALEw.....
I need to extract json data of {"shortVersion":"v3.1.56"}
The last time I tried to simply find the string shortVersion and return a certain number of characters after, but it seems like I'm trying to create the bicycle from scratch. Is there proper way to identify and extract json from the mixed text?
public static void findVersion()
{
var partialName = "main.*.js";
string[] filesInDir = Directory.GetFiles(#pathToFile, partialName);
var lines = File.ReadLines(filesInDir[0]);
foreach (var line in File.ReadLines(filesInDir[0]))
{
string keyword = "shortVersion";
int indx = line.IndexOf(keyword);
if (indx != -1)
{
string code = line.Substring(indx + keyword.Length);
Console.WriteLine(code);
}
}
}
RESULT
":"v3.1.56"}'),Ne="data:image/png;base64,iVBORw0KGgoAA.....
string findJson(string input, string keyword) {
int startIndex = input.IndexOf(keyword) - 2; //Find the starting point of shortversion then subtract 2 to start at the { bracket
input = input.Substring(startIndex); //Grab everything after the start index
int endIndex = 0;
for (int i = 0; i < input.Length; i++) {
char letter = input[i];
if (letter == '}') {
endIndex = i; //Capture the first instance of the closing bracket in the new trimmed input string.
break;
}
}
return input.Remove(endIndex+1);
}
Console.WriteLine(findJson("fwekjfwkejwe{'shortVersion':'v3.1.56'}wekjrlklkj23klj23jkl234kjlk", "shortVersion"));
You will recieve {'shortVersion':'v3.1.56'} as output
Note you may have to use line.Replace('"', "'");
Try below method -
public static object ExtractJsonFromText(string mixedStrng)
{
for (var i = mixedStrng.IndexOf('{'); i > -1; i = mixedStrng.IndexOf('{', i + 1))
{
for (var j = mixedStrng.LastIndexOf('}'); j > -1; j = mixedStrng.LastIndexOf("}", j -1))
{
var jsonProbe = mixedStrng.Substring(i, j - i + 1);
try
{
return JsonConvert.DeserializeObject(jsonProbe);
}
catch
{
}
}
}
return null;
}
Fiddle
https://dotnetfiddle.net/N1jiWH
You should not use GetFiles() since you only need one and that returns all before you can do anything. This should give your something you can work with here and it should be as fast as it likely can be with big files and/or lots of files in a folder (to be fair I have not tested this on such a large file system or file)
using System;
using System.IO;
using System.Linq;
public class Program
{
public static void Main()
{
Console.WriteLine("Hello World");
var path = $#"c:\SomePath";
var jsonString = GetFileVersion(path);
if (!string.IsNullOrWhiteSpace(jsonString))
{
// do something with string; deserialize or whatever.
var result=JsonConvert.DeserializeObject<List<Version>>(jsonString);
var vers = result.shortVersion;
}
}
private static string GetFileVersion(string path)
{
var partialName = "main.*.js";
// JSON string fragment to find: doubled up braces and quotes for the $# string
string matchString = $#"{{""shortVersion"":";
string matchEndString = $#" ""}}'";
// we can later stop on the first match
DirectoryInfo dir = new DirectoryInfo(path);
if (!dir.Exists)
{
throw new DirectoryNotFoundException("The directory does not exist.");
}
// Call the GetFileSystemInfos method and grab the first one
FileSystemInfo info = dir.GetFileSystemInfos(partialName).FirstOrDefault();
if (info.Exists)
{
// walk the file contents looking for a match (assumptions made here there IS a match and it has that string noted)
var line = File.ReadLines(info.FullName).SkipWhile(line => !line.Contains(matchString)).Take(1).First();
var indexStart = line.IndexOf(matchString);
var indexEnd = line.IndexOf(matchEndString, indexStart);
var jsonString = line.Substring(indexStart, indexEnd + matchEndString.Length);
return jsonString;
}
return string.Empty;
}
public class Version
{
public string shortVersion { get; set; }
}
}
Use this this should be faster - https://dotnetfiddle.net/sYFvYj
public static object ExtractJsonFromText(string mixedStrng)
{
string pattern = #"\(\'\{.*}\'\)";
string str = null;
foreach (Match match in Regex.Matches(mixedStrng, pattern, RegexOptions.Multiline))
{
if (match.Success)
{
str = str + Environment.NewLine + match;
}
}
return str;
}

How to reverse an array of strings without changing the position of special characters in C#

I'm working on reversing a sentence. I'm able to do it. But I'm not sure, how to reverse the word without changing the special characters positions. I'm using regex but as soon as it finds the special characters it's stopping the reversal of the word.
Following is the code:
Console.WriteLine("Enter:");
string w = Console.ReadLine();
string rw = String.Empty;
String[] arr = w.Split(' ');
var regexItem = new Regex("^[a-zA-Z0-9]*$");
StringBuilder appendString = new StringBuilder();
for (int i = 0; i < arr.Length; i++)
{
char[] chararray = arr[i].ToCharArray();
for (int j = chararray.Length - 1; j >= 0; j--)
{
if (regexItem.IsMatch(rw))
{
rw = appendString.Append(chararray[j]).ToString();
}
}
sb.Append(' ');
}
Console.WriteLine(rw);
Console.ReadLine();
Example : Input
Marshall! Hello.
Expected output
llahsram! olleh.
A basic solution with regex and LINQ. Try it online.
public static void Main()
{
Console.WriteLine("Marshall! Hello.");
Console.WriteLine(Reverse("Marshall! Hello."));
}
public static string Reverse(string source)
{
// we split by groups to keep delimiters
var parts = Regex.Split(source, #"([^a-zA-Z0-9])");
// if we got a group of valid characters
var results = parts.Select(x => x.All(char.IsLetterOrDigit)
// we reverse it
? new string(x.Reverse().ToArray())
// or we keep the delimiters as it
: x);
// then we concat all of them
return string.Concat(results);
}
The same solution without LINQ. Try it online.
public static void Main()
{
Console.WriteLine("Marshall! Hello.");
Console.WriteLine(Reverse("Marshall! Hello."));
}
public static bool IsLettersOrDigits(string s)
{
foreach (var c in s)
{
if (!char.IsLetterOrDigit(c))
{
return false;
}
}
return true;
}
public static string Reverse(char[] s)
{
Array.Reverse(s);
return new string(s);
}
public static string Reverse(string source)
{
var parts = Regex.Split(source, #"([^a-zA-Z0-9])");
var results = new List<string>();
foreach(var x in parts)
{
results.Add(IsLettersOrDigits(x)
? Reverse(x.ToCharArray())
: x);
}
return string.Concat(results);
}
This is a solution without LINQ. I wasn't sure about what are considered special characters.
string sentence = "Marshall! Hello.";
List<string> words = sentence.Split(' ').ToList();
List<string> reversedWords = new List<string>();
foreach (string word in words)
{
char[] arr = new char[word.Length];
for( int i=0; i<word.Length; i++)
{
if(!Char.IsLetterOrDigit((word[i])))
{
for ( int x=0; x< i; x++)
{
arr[x] = arr[x + 1];
}
arr[i] = word[i];
}
else
{
arr[word.Length - 1 - i] = word[i];
}
}
reversedWords.Add(new string(arr));
}
string reversedSentence = string.Join(" ", reversedWords);
Console.WriteLine(reversedSentence);
And this is the output:
Updated Output = llahsraM! olleH.
Here is a non-regex version that does what you want:
var sentence = "Hello, john!";
var parts = sentence.Split(' ');
var reversed = new StringBuilder();
var charPositions = sentence.Select((c, idx) => new { Char = c, Index = idx })
.Where(_ => !char.IsLetterOrDigit(_.Char));
for (int i = 0; i < parts.Length; i++)
{
var chars = parts[i].ToCharArray();
for (int j = chars.Length - 1; j >= 0; j--)
{
if (char.IsLetterOrDigit(chars[j]))
{
reversed.Append(chars[j]);
}
}
}
foreach (var ch in charPositions)
{
reversed.Insert(ch.Index, ch.Char);
}
// olleH, nhoj!
Console.WriteLine(reversed.ToString());
Basically the trick is to remember the position of special (i.e. non letter or digit) characters and insert them at the end to those positions.
This solution is without LINQ and Regex. It may not be an efficient answer but working properly for small string values.
// This will reverse the string and special characters will just stay there.
public string ReverseString(string rString)
{
StringBuilder ss = new StringBuilder(rString);
int y = 0;
// The idea is to swap values. Like swapping first value with last one. It will keep swapping unless it reaches at the middle of the string where no swapping will be needed.
// This first loop is to detect first values.
for(int i=rString.Length-1;i>=0;i--)
{
// This condition is to check if the values is String or not. If it is not string then it is considered as special character which will just stay there at same old position.
if(Char.IsLetter(Convert.ToChar(rString.Substring(i,1))))
{
// This is second loop which is starting from end to swap values from end with first.
for (int k = y; k < rString.Length; k++)
{
// Again checking last values if values are string or not.
if (Char.IsLetter(Convert.ToChar(rString.Substring(k, 1))))
{
// This is swapping. So st1 is First value in that string
// st2 is the last item in that string
char st1 = Convert.ToChar(rString.Substring(k, 1));
char st2 = Convert.ToChar(rString.Substring(i, 1));
//This is swapping. So last item will go to first position and first item will go to last position, To make sure string is reversed.
// Remember when the string value is Special Character, swapping will move forward without swapping.
ss[rString.IndexOf(rString.Substring(i, 1))] = st1;
ss[rString.IndexOf(rString.Substring(k, 1))] = st2;
y++;
// When the swapping is done for first 2 items. The loop will stop to change the values.
break;
}
else
{
// This is just increment if value was Special character.
y++;
}
}
}
}
return ss.ToString();
}
Thanks!

How to keep only numbers and some special characters in a string?

I have a string containing regular characters, special characters and numbers. I'm trying to remove the regular characters, just keeping the numbers and the special characters. I use a loop to check if a character is a special character or a number. Then, I replace it with an empty string. However, this doesn't seem to work because I get an error "can't apply != to string or char". My code is below. If possible, please give me some ideas to fix this. Thanks.
public string convert_string_to_no(string val)
{
string str_val = "";
int val_len = val.Length;
for (int i = 0; i < val_len; i++)
{
char myChar = Convert.ToChar(val.Substring(i, 1));
if ((char.IsDigit(myChar) == false) && (myChar != "-"))
{
str_val = str_val.replace(str_val.substring(i,1),"");
}
}
return str_val;
}
you can use regular expressions to do that.its faster than using loop and clean
String test ="Q1W2-hjkxas1-EE3R4-5T";
Regex rgx = new Regex("[^0-9-]");
Console.WriteLine(rgx.Replace(test, ""));
check the working code here
It seem try to change "-" to '-', and better to construct the string and not replacing the char.
public string convert_string_to_no(string val)
{
string str_val = "";
int val_len = val.Length;
for (int i = 0; i < val_len; i++)
{
char myChar = Convert.ToChar(val.Substring(i, 1));
if (char.IsDigit(myChar) && myChar == '-')
{
str_val += myChar;
}
}
return str_val;
}
I perfer Linq:
public static class StringExtensions
{
public static string ToLimitedString(this string instance,
string validCharacters)
{
// null reference checking...
var result = new string(instance
.Where(c => validCharacters.Contains(c))
.ToArray());
return result;
}
}
usage:
var test ="Q1W2-hjkxas1-EE3R4-5T";
var limited = test.ToLimitedString("01234567890-");
Console.WriteLine(limited);
result:
12-1-34-5
DotNetFiddle Example

Counting number of words in C#

I'm trying to count the number of words from a rich textbox in C# the code that I have below only works if it is a single line. How do I do this without relying on regex or any other special functions.
string whole_text = richTextBox1.Text;
string trimmed_text = whole_text.Trim();
string[] split_text = trimmed_text.Split(' ');
int space_count = 0;
string new_text = "";
foreach(string av in split_text)
{
if (av == "")
{
space_count++;
}
else
{
new_text = new_text + av + ",";
}
}
new_text = new_text.TrimEnd(',');
split_text = new_text.Split(',');
MessageBox.Show(split_text.Length.ToString ());
char[] delimiters = new char[] {' ', '\r', '\n' };
whole_text.Split(delimiters,StringSplitOptions.RemoveEmptyEntries).Length;
Since you are only interested in word count, and you don't care about individual words, String.Split could be avoided. String.Split is handy, but it unnecessarily generates a (potentially) large number of String objects, which in turn creates an unnecessary burden on the garbage collector. For each word in your text, a new String object needs to be instantiated, and then soon collected since you are not using it.
For a homework assignment, this may not matter, but if your text box contents change often and you do this calculation inside an event handler, it may be wiser to simply iterate through characters manually. If you really want to use String.Split, then go for a simpler version like Yonix recommended.
Otherwise, use an algorithm similar to this:
int wordCount = 0, index = 0;
// skip whitespace until first word
while (index < text.Length && char.IsWhiteSpace(text[index]))
index++;
while (index < text.Length)
{
// check if current char is part of a word
while (index < text.Length && !char.IsWhiteSpace(text[index]))
index++;
wordCount++;
// skip whitespace until next word
while (index < text.Length && char.IsWhiteSpace(text[index]))
index++;
}
This code should work better with cases where you have multiple spaces between each word, you can test the code online.
There are some better ways to do this, but in keeping with what you've got, try the following:
string whole_text = richTextBox1.Text;
string trimmed_text = whole_text.Trim();
// new line split here
string[] lines = trimmed_text.Split(Environment.NewLine.ToCharArray());
// don't need this here now...
//string[] split_text = trimmed_text.Split(' ');
int space_count = 0;
string new_text = "";
Now make two foreach loops. One for each line and one for counting words within the lines.
foreach (string line in lines)
{
// Modify the inner foreach to do the split on ' ' here
// instead of split_text
foreach (string av in line.Split(' '))
{
if (av == "")
{
space_count++;
}
else
{
new_text = new_text + av + ",";
}
}
}
new_text = new_text.TrimEnd(',');
// use lines here instead of split_text
lines = new_text.Split(',');
MessageBox.Show(lines.Length.ToString());
}
This was a phone screening interview question that I just took (by a large company located in CA who sells all kinds of devices that starts with a letter "i"), and I think I franked... after I got offline, I wrote this. I wish I were able to do it during interview..
static void Main(string[] args)
{
Debug.Assert(CountWords("Hello world") == 2);
Debug.Assert(CountWords(" Hello world") == 2);
Debug.Assert(CountWords("Hello world ") == 2);
Debug.Assert(CountWords("Hello world") == 2);
}
public static int CountWords(string test)
{
int count = 0;
bool wasInWord = false;
bool inWord = false;
for (int i = 0; i < test.Length; i++)
{
if (inWord)
{
wasInWord = true;
}
if (Char.IsWhiteSpace(test[i]))
{
if (wasInWord)
{
count++;
wasInWord = false;
}
inWord = false;
}
else
{
inWord = true;
}
}
// Check to see if we got out with seeing a word
if (wasInWord)
{
count++;
}
return count;
}
Have a look at the Lines property mentioned in #Jay Riggs comment, along with this overload of String.Split to make the code much simpler. Then the simplest approach would be to loop over each line in the Lines property, call String.Split on it, and add the length of the array it returns to a running count.
EDIT: Also, is there any reason you're using a RichTextBox instead of a TextBox with Multiline set to True?
I use an extension method for grabbing word count in a string. Do note, however, that double spaces will mess the count up.
public static int CountWords(this string line)
{
var wordCount = 0;
for (var i = 0; i < line.Length; i++)
if (line[i] == ' ' || i == line.Length - 1)
wordCount++;
return wordCount;
}
}
Your approach is on the right path. I would do something like, passing the text property of richTextBox1 into the method. This however won't be accurate if your rich textbox is formatting HTML, so you'll need to strip out any HTML tags prior to running the word count:
public static int CountWords(string s)
{
int c = 0;
for (int i = 1; i < s.Length; i++)
{
if (char.IsWhiteSpace(s[i - 1]) == true)
{
if (char.IsLetterOrDigit(s[i]) == true ||
char.IsPunctuation(s[i]))
{
c++;
}
}
}
if (s.Length > 2)
{
c++;
}
return c;
}
We used an adapted form of Yoshi's answer, where we fixed the bug where it would not count the last word in a string if there was no white-space after it:
public static int CountWords(string test)
{
int count = 0;
bool inWord = false;
foreach (char t in test)
{
if (char.IsWhiteSpace(t))
{
inWord = false;
}
else
{
if (!inWord) count++;
inWord = true;
}
}
return count;
}
using System.Collections;
using System;
class Program{
public static void Main(string[] args){
//Enter the value of n
int n = Convert.ToInt32(Console.ReadLine());
string[] s = new string[n];
ArrayList arr = new ArrayList();
//enter the elements
for(int i=0;i<n;i++){
s[i] = Console.ReadLine();
}
string str = "";
//Filter out duplicate values and store in arr
foreach(string i in s){
if(str.Contains(i)){
}else{
arr.Add(i);
}
str += i;
}
//Count the string with arr and s variables
foreach(string i in arr){
int count = 0;
foreach(string j in s){
if(i.Equals(j)){
count++;
}
}
Console.WriteLine(i+" - "+count);
}
}
}
int wordCount = 0;
bool previousLetterWasWhiteSpace = false;
foreach (char letter in keyword)
{
if (char.IsWhiteSpace(letter))
{
previousLetterWasWhiteSpace = true;
}
else
{
if (previousLetterWasWhiteSpace)
{
previousLetterWasWhiteSpace = false;
wordCount++;
}
}
}
public static int WordCount(string str)
{
int num=0;
bool wasInaWord=true;;
if (string.IsNullOrEmpty(str))
{
return num;
}
for (int i=0;i< str.Length;i++)
{
if (i!=0)
{
if (str[i]==' ' && str[i-1]!=' ')
{
num++;
wasInaWord=false;
}
}
if (str[i]!=' ')
{
wasInaWord=true;
}
}
if (wasInaWord)
{
num++;
}
return num;
}
class Program
{
static void Main(string[] args)
{
string str;
int i, wrd, l;
StringBuilder sb = new StringBuilder();
Console.Write("\n\nCount the total number of words in a string
:\n");
Console.Write("---------------------------------------------------
---\n");
Console.Write("Input the string : ");
str = Console.ReadLine();
l = 0;
wrd = 1;
foreach (var a in str)
{
sb.Append(a);
if (str[l] == ' ' || str[l] == '\n' || str[l] == '\t')
{
wrd++;
}
l++;
}
Console.WriteLine(sb.Replace(' ', '\n'));
Console.Write("Total number of words in the string is : {0}\n",
wrd);
Console.ReadLine();
}
This should work
input.Split(' ').ToList().Count;
This can show you the number of words in a line
string line = Console.ReadLine();
string[] word = line.Split(' ');
Console.WriteLine("Words " + word.Length);
You can also do it in this way!! Add this method to your extension methods.
public static int WordsCount(this string str)
{
return Regex.Matches(str, #"((\w+(\s?)))").Count;
}
And call it like this.
string someString = "Let me show how I do it!";
int wc = someString.WordsCount();

Remove HTML tags and comments from a string in C#?

How do I remove everything beginning in '<' and ending in '>' from a string in C#. I know it can be done with regex but I'm not very good with it.
The tag pattern I quickly wrote for a recent small project is this one.
string tagPattern = #"<[!--\W*?]*?[/]*?\w+.*?>";
I used it like this
MatchCollection matches = Regex.Matches(input, tagPattern);
foreach (Match match in matches)
{
input = input.Replace(match.Value, string.Empty);
}
It would likely need to be modified to correctly handle script or style tags.
Non regex option: But it still won't parse nested tags!
public static string StripHTML(string line)
{
int finished = 0;
int beginStrip;
int endStrip;
finished = line.IndexOf('<');
while (finished != -1)
{
beginStrip = line.IndexOf('<');
endStrip = line.IndexOf('>', beginStrip + 1);
line = line.Remove(beginStrip, (endStrip + 1) - beginStrip);
finished = line.IndexOf('<');
}
return line;
}
Another non-regex code that works 8x faster than regex:
public static string StripTagsCharArray(string source)
{
char[] array = new char[source.Length];
int arrayIndex = 0;
bool inside = false;
for (int i = 0; i < source.Length; i++)
{
char let = source[i];
if (let == '<')
{
inside = true;
continue;
}
if (let == '>')
{
inside = false;
continue;
}
if (!inside)
{
array[arrayIndex] = let;
arrayIndex++;
}
}
return new string(array, 0, arrayIndex);
}

Categories