Search and Replace inside a List with c# - c#

I am a newbie for programming, at this stage I am using an automation software, it supports c# and js.
Is it possible to search each line and replace a word?
Example data
List name: A
Sample data a
Sample data b
Sample data c
To create a c# code so that when there is "a", it changes to x1
This one below is the most close, but it will remove that whole line and replace it with x1. My goal is only to replace a particular word.
If there can be an option to define multiple matches, that would be great.
a > x1
b > x2
c > x3
The code I found that does search and replace, however it remove the whole line that contains this particular match:
The code below will remove whole line that contains a number, and replace it with 1
I found the code in this forum.
var sourceList = project.Lists["A"]; // define list name.
var parserRegex = new Regex("\\d{1,2}"); // it will match all numbers
lock(SyncObjects.ListSyncer)
{
for(int i=0; i < sourceList.Count; i++) // loop through each line.
{
if (parserRegex.IsMatch(sourceList[i])) // to check if there is a match
{
sourceList[i]="1"; // This code do the replacing job, but it replace the whole line, not the string.
}
}
}
With your guys' help, I think I have got what I wanted:
Here is my final code that is working for now. It is not perfect but works for my purpose.
My question still remains is that for replacing command, how to define replacing "a" means only to replace like
Turn "Sample data a" into "sample data x1"
But not to do like "Sx1mple dx1tx1 x1".
Code:
var sourceList = project.Lists["A-Source"]; // define list name.
var parserRegex = new Regex("data"); // it will match all numbers
lock(SyncObjects.ListSyncer)
{
for(int i=0; i < sourceList.Count; i++) // loop through each line.
{
// if (parserRegex.IsMatch(sourceList[i])) // to check if there is a match. This line is commented out and it still works.
{
sourceList[i]= sourceList[i].Replace("a", "x1")
.Replace("b","x2")
.Replace("c","x3")
.Replace("<p>","")
.Replace("<strong>",""); // I added two other lines, to remove like p and strong tags, it works!
}
}
}
The replace pair's left part is the target, and the right part is the final replacing text.
In real examples, it can't just be "a", "b" or "c" because a is not only going to replace a, but also the "a" symbol in word like "data".
C# is powerful, thanks for the generous input!

Like Johannes mentioned in comments, in C# you can use String.Replace()
sourceList[i]= sourceList[i].Replace("a", "x1")

This should work:
var sourceList = project.Lists["A"]; // define list name.
string pattern = #"(?'matched'\d*)";
for (int i = 0; i < sourceList.Count; i++) // loop through each line.
{
foreach (Match m in Regex.Matches(sourceList[i], pattern))
{
Group g = m.Groups["matched"];
if (!string.IsNullOrEmpty(g.Value))
{
sourceList[i] = sourceList[i].Replace(g.Value, "newvalue");
}
}
}

Related

How to programmatically count the occurrence of regular expression as quickly as text editor

I have written a C# program that opens a particular directory. It then opens each file in that directory and counts every occurrence of the following regular expression #"^CLM". The program returns the regular expression count from each file and places that count into a separate cell in a spreadsheet. The code I am using is below:
List<string> linesPost = System.IO.File.ReadAllLines(diPostFiles + curPostFile).ToList();
int y = 0;
for (int i = linesPost.Count - 1; i >= 0; i--)
{
string pattern = #"^CLM";
Match m = Regex.Match(linesPost[i], pattern);
while (m.Success)
{
y++;
break;
}
(xlRange.Cells[startRow + x, 3] as Excel.Range).Value2 = y;
}
This does the work, but it takes a long time. If I open a given file in Notepad++, for example, and put in the same regular expression then hit the count button, I get the result very quickly.
Is there a more efficient way to count the instances of the regular expression? I am anticipating roughly 5,000 occurrence per text file. Overall size of each text file is roughly 5 MB.
Any help is greatly appreciated.
First and foremost, you do not need any regex. You are just checking if each line starts with CLM.
Instead of
string pattern = #"^CLM";
Match m = Regex.Match(linesPost[i], pattern);
while (m.Success)
{
y++;
break;
}
You may just use
if (linesPost[i].StartsWith("CLM"))
y++;
If you assign CLM variable, try assigning it before the loop if it does not change until the loop end.
Also, you have a line referring to early binding with Excel interop. I suggest using late binding or dynamic types to work with Excel objects, and do it after the loop. Right now, you access it in the loop, and it might take a lot of time. Add a List<string> variable before the loop, collect the values, and then insert into Excel after they are all collected.
If you want speed, read in the entire file into a string variable.
Then run the regex on it, something like below.
This is the fastest way this can be done for 2 reasons.
1. The lines are continuous, not split into an array.
2. Regex engine code stays in the lowest level until it finds a match.
(i.e. it will return a match possibly hundreds of lines apart from the last one)
note - You did say speed. If you don't want speed, then don't use this way.
int y = 0;
string allLines = #"read the whole file into 'string'";
Regex RxCounter = new Regex(#"(?m)^CLM"); // Unsing (?m) multi-line modifier option, inline.
// If Dot-Net does not recognise this inline option
// set it in the options field of the constructor.
Match _m = RxCounter.Match( allLines );
while (_m.Success)
{
y++;
(xlRange.Cells[startRow + x, 3] as Excel.Range).Value2 = y;
_m = _m.NextMatch();
}
You can compile the Regex outside of the loop (var r = new Regex(pattern, ...)) and just apply it inside (r.Match(...))... this alone should give you some speed-up, because it does not need to be compile over and over again.

The best way to 4 letter words based on string with limits?

So I'm using this code down here to figure out all the words that could be spelled out of the alphabet variable, the problem is , I build this alphabet variable each time I call this based on the board of random letters in front of the user. What i see though , and of course, is "aaab" for example...
What I'm after is for code to only use the letter as many times as it appears in the alphabet var, so that it can't do something like "aaab" but just "ab"
I understand this code that I found in another thread is made to build combinations of the letters into 4 letter words, or arrangements,
I'm wondering if theres a simple way using SelectMany or Select, to not add up its self if its already been used, keep in mind there could be multiple "a's" in the alphabet var to begin with, so if theres 2 A's in there, it should still be able to to AAB, just not AAAB. I am a newbie, I know that I could go through my own list and add letters together based on how many times they actually exist in the alphabet string..im just wondering if theres a way to interupt i or x and not add to q if its already been used...
sorry if this is confusing... thank you :)
// I found this in another thread and seemed to work great and fast.
var alphabet = "abcd";
var q = alphabet.Select(x => x.ToString());
int size = 4;
for (int i = 0; i < size - 1; i++)
q = q.SelectMany(x => alphabet, (x, y) => x + y);
foreach (var item in q)
( DO STUFF)
To reach your goal, you must find a way to mark letters in your alphabet which are already used and avoid using these letters a second time.
To do so you need a data structure which can store more than the letters alone, so a list of letters (or a string) is not sufficient.
Try to bulid a list of classes like this one:
class UsedLetter
{
char letter;
bool used;
}
Then you can mark each letter as used after you drew it from the list.
Improvement
You may also store your alphabet as a list of characters:
List<char> alphabet;
and remove each letter from the alphabet after its drawn.
Here's how I have achieved what I think you're after:
using System;
using System.Collections.Generic;
using System.Linq;
namespace WordPerms
{
class Program
{
Stack<char> chars = new Stack<char>();
List<string> words = new List<string>();
static void Main(string[] args)
{
Program p = new Program();
p.GetChar("abad");
foreach (string word in p.words)
{
Console.WriteLine(word);
}
}
// This is called recursively to build the list of words.
private void GetChar(string alpha)
{
string beta;
for (int i = 0; i < alpha.Length; i++)
{
chars.Push(alpha[i]);
beta = alpha.Remove(i, 1);
GetChar(beta);
}
char[] charArray = chars.Reverse().ToArray();
words.Add(new string(charArray));
if (chars.Count() >= 1)
{
chars.Pop();
}
}
}
}
Hope that helps, Greg.

Get line with starts with some number

I have a file and I have to process this file, but I have to pick just the last line of the file, and check if this line begins with the number 9, how can I do this using linq ... ?
This record, which begins with the number 9, can sometimes, not be the last line of the file, because the last line can be a \r\n
I maded one simple system to make thsi:
var lines = File.ReadAllLines(file);
for (int i = 0; i < lines.Length; i++)
{
if (lines[i].StartsWith("9"))
{
//...
}
}
But, I whant to know if is possible to make something more fast... or, more better, using linq... :)
string output=File.ReadAllLines(path)
.Last(x=>!Regex.IsMatch(x,#"^[\r\n]*$"));
if(output.StartsWith("9"))//found
The other answers are fine, but the following is more intuitive to me (I love self-documenting code):
Edit: misinterpreted your question, updating my example code to be more appropriate
var nonEmptyLines =
from line in File.ReadAllLines(path)
where !String.IsNullOrEmpty(line.Trim())
select line;
if (nonEmptyLines.Any())
{
var lastLine = nonEmptyLines.Last();
if (lastLine.StartsWith("9")) // or char.IsDigit(lastLine.First()) for 'any number'
{
// Your logic here
}
}
You don't need LINQ something like following should work:
var fileLines = File.ReadAllLines("yourpath");
if(char.IsDigit(fileLines[fileLines.Count() - 1][0])
{
//last line starts with a digit.
}
Or for checking against specific digit 9 you can do:
if(fileLines.Last().StartsWith("9"))
if(list.Last(x =>!string.IsNullOrWhiteSpace(x)).StartsWith("9"))
{
}
Since you need to check the last two lines (in case the last line is a newline), you can do this. You can change lines to however many last lines you want to check.
int lines = 2;
if(File.ReadLines(file).Reverse().Take(lines).Any(x => x.StartsWith("9")))
{
//one of the last X lines starts with 9
}
else
{
//none of the last X lines start with 9
}

Select text from line A to line B in richtextbox

i'm looking for a way to select text between two lines (A and B) in a richtextbox.
I tried something like this:
richTextBox1.Select
(
richTextBox1.GetFirstCharIndexFromLine(parentesi_inizio[current_idx]),
(
richTextBox1.GetFirstCharIndexFromLine(parentesi_fine[current_idx]) -
richTextBox1.GetFirstCharIndexFromLine(parentesi_inizio[current_idx]) + 1
)
);
Inside parentesi_inizio and parentesi_fine i have the line number, i should select from line A (parentesi_inizio) to B (parentesi_fine).
After some tests i think the problem is this:
richTextBox1.GetFirstCharIndexFromLine(parentesi_fine[current_idx]) -
richTextBox1.GetFirstCharIndexFromLine(parentesi_inizio[current_idx]) + 1
This code works fine at first, but I noticed that after a while 'begins to show results stoner.
I did further testing and the lines are correct (ie, refer to the right spot) whereas "Select" does not select the entire portion or does it incorrectly (selecting parts that do not have to)
( I used google translate for the last part )
EDIT:
Imagine this text:
Hello
World || A points here (Line 1)
Guys!
This
is
a || B points here (Line 5)
line
I need to Select (not get text) this text:
World
Guys!
This
is
a
in the richtextbox.
Example images:
Case 1:
Case 2:
Thats is what i want and what the code i posted do, but after a while the code starts to bug as i said above (at the start).
EDIT 2:
I changed my code to this after varocarbas reply
richTextBox1.Select (
richTextBox1.GetFirstCharIndexFromLine(parentesi_inizio[current_idx]),
richTextBox1.GetFirstCharIndexFromLine(parentesi_inizio[current_idx])
+ count_length(parentesi_inizio[current_idx], parentesi_fine[current_idx]) );
where count_length is
private int count_length(int A, int B)
{
// A => first line
// B => last line
int tot = 0;
for (int i = A; i <= B; ++i)
{
// read the length of every line between A and B
tot += richTextBox1.Lines[i].Length - 1;
}
// return it
return tot;
}
but now the code not work in every case.. anyway here is a screen of a bugged case using the old code (the code posted at the start of the question)
(source: site11.com)
it select right from the first { but not reach the last } (i do some checks here and the problem is the subtraction not the lines num.)
EDIT 3:
I'm ready sorry varocarbas, i think i just wasted your time.. after see my screen i noticed the problem might be the word wrap i tried to disable it and seems now works ok.. sorry for your time.
Why not relying on the lines[] array directly?
string line2 = richTextBox1.lines[1];
Bear in mind that it has its own indexing, that is, to get the first character in the third line you can do:
int firstChar3 = richTextBox1.lines[2].Substring(0, 1);
To refer to the whole richTextBox indexing system, you can rely also on GetFirstCharIndexFromLine. That is:
int startIndexLine2 = richTextBox1.GetFirstCharIndexFromLine(1); //Start index line2
int endIndexLine2 = startIndexLine2 + richTextBox1.lines[1].length - 1; //End index line2
-------- AFTER UPDATED QUESTION
Sorry about that, but I cannot see the code in the links you provided. But the code below should deliver the outputs you want:
int curStart = richTextBox1.GetFirstCharIndexFromLine(2);
richTextBox1.Select(curStart, richTextBox1.Lines[2].Length);
string curText = richTextBox1.SelectedText; -> "Guys!"
curStart = richTextBox1.GetFirstCharIndexFromLine(3);
richTextBox1.Select(curStart, richTextBox1.Lines[3].Length);
curText = richTextBox1.SelectedText; -> "This"
curStart = richTextBox1.GetFirstCharIndexFromLine(4);
richTextBox1.Select(curStart, richTextBox1.Lines[4].Length);
curText = richTextBox1.SelectedText; -> "is"

How check if letters are in string?

It quite hard question to ask but I will try.
I have my 4 letters m u g o . I have also free string word(s).
Let'say: og ogg muogss. I am looking for any wise method to check if I can construct word(s) using only my letters. Please take notice that we used once g we won't be able to use it again.
og - possible because we need only **g** and **o**
ogg - not possible we took **o** and **g**, need the second **g**
muogss - not possible we took all, need also additional **s**
So my tactic is take my letters to char array and remove one by one and check how many left to build the word(s). But is it possible to use somehow in few lines, i do not know - regex ?
your method is only a few lines...
public static bool CanBeMadeFrom(string word, string letters)
{
foreach (var i in word.Select(c => letters.IndexOf(c, 0)))
{
if (i == -1) return false;
letters = letters.Remove(i, 1);
}
return true;
}
Here's a simple approach:
For your source word, create an array of size 26 and use it to count the how many times each letter appears.
Do the same for each word in your dictionary.
Then compare the two.
If every letter occurs less than or equal to as many times in the dictionary word as the source word, then it can be used to make that word. If not, then it cannot.
C-Sharpish Pseudocode: (probably doesn't compile as written)
/** Converts characters to a 0 to 25 code representing alphabet position.
This is specific to the English language and would need to be modified if used
for other languages. */
int charToLetter(char c) {
return Char.ToUpper(c)-'A';
}
/** Given a source word and an array of other words to check, returns all
words from the array which can be made from the letters of the source word. */
ArrayList<string> checkSubWords(string source, string[] dictionary) {
ArrayList<string> output = new ArrayList<string>();
// Stores how many of each letter are in the source word.
int[] sourcecount = new int[26]; // Should initialize to 0, automatically
foreach (char c in source) {
sourcecount[c]++;
}
foreach (string s in dictionary) {
// Stores how many of each letter are in the dictionary word.
int[] dictcount = new int[26]; // Should initialize to 0, automatically
foreach (char c in s) {
dictcount[c]++;
}
// Then we check that there exist no letters which appear more in the
// dictionary word than the source word.
boolean isSubword = true;
for (int i=0;i<26;i++) {
if (dictcount[i] > sourcecount[i]) {
isSubword = false;
}
}
// If they're all less than or equal to, then we add it to the output.
if (isSubWord) {
output.add(s);
}
}
return output;
}
If your definition of words is any arbitrary permutation of the available charactters then why do you need a regex? Just make sure you use each characters once. Regex doesn't know what a "correct word" is, and it's better to avoid using invalid characters by your algorithms than using them AND using a regex to make sure you didn't use them.

Categories