How to split text file by comments symbols in C# [duplicate] - c#

This question already has answers here:
Extract comments from .cs file
(2 answers)
Closed 1 year ago.
Im trying for a while to split code file (treated as text file) by the comments that in it.
For example, for the input:
// Hi guys, I am trying to get some help here.
// I really tried to do this alone.
/* But i still search for help
in our bes friend Google.*/
I expect to get the output:
Hi guys, I am trying to get some help here.
I really tried to do this alone.
But i still search for help in our bes friend Google.
so basiclly i want to recognize that there is a comments in the file (by the symbols // and /* */) and enter the comments in a list (each comment in a differend cell).
I am trying to do so by the code line: codeFile.Split('//', '/', '/');
But with no success.
As well, since it is possible for multi-line comment when using the "/* */" symbol, how can i enter the intire string between them to my list since I am run over the file by the lines?
Thanks in advence.

I would do something like :
Read your file line per line
Check if line match with Regex for your different criterias.
Build a new string (with line break if needed) using the information you got from your checking. You will be able to handle the multi line factor.
Hope this could be helpful.

Related

File.ReadLines().ToList() has a Count of 1 on a file with 2 blank lines?

I'm using File.ReadLines().ToList() to read a regular text file into a List<string>.
The text file has 2 blank lines, like so ('View all characters' enabled in notepad++ for clarity):
Example code:
List<string> lines = null;
try{
lines = File.ReadLines("C:\path\to\file.txt").ToList();
}catch(Exception e){
//code here to handle e
}
Console.WriteLine(lines.Count.ToString());
Prints "1" to console.
My question is, why does my list that is generated by File.ReadLines().ToList() only have a Count of 1 when the file has 2 lines? Is a blank line at the end of the file just discarded by default? (it seems so)
Thanks to Hans Passant for the answer. I was hoping he'd post it here, but I'm going to go ahead and do it now, as it looks like this question is very close to being closed (not sure why???) and I think this could help others in the future.
Answer: Notepad++ is showing a 2nd line that doesn't actually exist in the file. By opening the file with vim in WSL, I was able to see that there is one (1) line in the file, and no more than that.

Unable to remove a special character from string [duplicate]

This question already has answers here:
Why do we always prefer using parameters in SQL statements?
(7 answers)
Closed 5 years ago.
Background
There is an application where users are required to enter information that will be stored in a DB. I then have an application that runs every 5 minute and gets the information that was entered by the user using the previous application. My app then grabs all the information from the database and then proceed to do create the given document and then places it in a server for the user to get. However users started having issues with a specific document, where certain functionalities were not executing correctly. So I identified the issue as being the string which a user entered in the entry application, in the title column they had "Jame's Bond Story" so my application creates the document and does not have any issue what so ever. So after debugging I identified the following problem.
Problem
Not sure how the specific user did what he did but the single quote ' was not really a single quote but some other type of weird character anomaly. I proved this by running the following code to see if I can remove it.
string cleanTitle = BookRec.TitleName.Replace("'","");
However this did not work for me at all. I then broke the string into a character array and instead of getting the character I got a weird digit. So then I proceeded into using this regex code to clean every character and only allow numbers and letters.
string cleanTitle = Regex.Replace(BookRec.TitleName, "[^\\w\\. _]", "");
This has now become an issue because the users want the Title to contain special the following characters ( ) _ , - .
I am looking for a way to to filter out any characters including the type I ran into this week and only allow the 6 characters which the users have agreed to. I can up with the following regex formula bu I am getting an empty string.
Regex fomrula = new Regex(#"^[a-zA-Z0-9_\[])(,\-.'");
However I am getting an empty string when I am replacing the title. I am not a big fan of regex, I am also open to a a sub string approach to this as well.
Appended Information
I am not able to access the application that inserts the information to the given database. I am only able to read from the database and then preform actions.
You may want to try something like this:
string cleanTitle = Regex.Replace(BookRec.TitleName, #"[^\u0000-\u007F]+", "");
This will replace any Unicode character that is not between those values. I'm not sure if those are the ones that are causing you problems but hopefully it may give you a hint in the right direction.

extract all URLs in a free text block using RegEx [duplicate]

This question already has answers here:
Extract Url using Regex
(2 answers)
Closed 8 years ago.
I'm attempting to detect all URLs listed in a free text block. I'm using the .nets Regex.Matches call.. with the following regex: (http|https)://[^\s "']{4,}
Now, I've put in the following text:
here is a link http://somelink.com
here is a link that I didn't space withhttp://nospacelink.com/something?something=&39358235
http://nospacelink.com/something?something=&12233454
here is a link I already handled.
Here is some secret t&cs you're not allowed to know https://somethingbad.com
Just to be a little annoying I've put in a new address thingy capture type of 'http://somethinginspeechmarks.com' and what are you going to do now?
here is a link http://postTextLink.com at then some post text
Here is a link with a full stop http://alinkwithafullstoplink.com. And then some more.
and I get the following output:
http://somelink.com
http://nospacelink.com?something=&39358235
http://nospacelink.com?something=&12233454
http://alreadyhandledlink.com
https://somethingbad.com
http://somethinginspeechmarks.com
http://postTextLink.com
http://alinkwithafullstoplink.com.
Please notice the full stop on the last entry. How can I update my regex to say "If there is a full stop at the end, please ignore it?"
Also, please note that "Getting parts of a URL (Regex)" has nothing to do with my question, as that question is about how to break down a particular URL. I want to extract multiple, complete urls. Please see my input and current outputs for clarification!
I have got a regex already that does most of what I want, but isn't quite right. Could you please explain where my approach might be improved?
I would add something like [^\.] to the pattern.
This pattern says that the last char can't be a full stop.
So for (http|https)://[^\s "']{4,}[^\.] it will try to match all adresses not ending with a full stop.
Edit:
This one should be better as said in comments: [^.\s"']
Updated:
Consider the following minor change to your pattern:
(http|https)://[^\s "']{4,}(?=\.)

C# Regex: How to break up plain text string

I have a problem which I am wondering how to solve.
I have a String I read in from a pdf file that has a list of questions.
It's in the format of:
QUESTION NO: 1
xxxxxxx (question text)
A) xxxx (multiple choice) B) xxxx C) xxxx ...
Answer: xxxxx
QUESTION NO: 2
xxxxxxx (question text)
.... (etc)
There are about 200 questions in the list.
I am trying to use Regex to break up the text so each question can be in a separate string.
I've done this before with html and xml documents, but they were easy since there are a lot of identifying tags like double quotes, brackets, and parentheses.
But I am clueless as to how to do this with just text. I've tried a lot of combinations, but it just seems like I can't get the right format:
var questionPattern = #"QUESTION NO:(.*)QUESTION NO:";
var questionMatch = Regex.Matches(pdfText, questionPattern, RegexOptions.Singleline);
I was wondering, is there a way to do:
var questionPattern = #"(?<=QUESTION NO:)[^QUESTION NO:]*";
Where the [^QUESTION NO:]* reads everything after each Question header until it stops when it comes to the next Question header?
Obviously this is the wrong format, but I hope people will understand what I'm trying to get at.
Any help would be greatly appreciated.
Thanks!
This is probably the best you're going to get - dependent on Answer. Lookaheads would need to be conditional, and would break the entire expression.
(QUESTION NO: \d+[\S\s]*?Answer.*\n*)
Working example: http://regex101.com/r/nC6yA1

Edit certain lines of textfile - c#

I have a text file...
HELLO
GOODBYE
FAREWELL
.. and two string variables, requested and newname.
I am trying to write a program, that removes 'requested' from the text file and adds 'newname'. However I do not know the concepts and the code to do so. The only concept I can think of is adding all lines to an array then removing 'requested' from the array and adding 'newname' ... but I don't know the code.
I apologies if this is a stupid question, I am new to c#. Help is much appreciated. :)
Your idea should work fine.
Here's a link on how to get the text file contents into a list collection.
http://www.dotnetperls.com/readline
Then you can use List.Add("newname"); and List.Remove("requested");
No stupid questions btw, only stupid ppl to post complaints about it :D
As simple as:
string path = "file path here";
List<string> lines = File.ReadAllLines(path).ToList();
lines.RemoveAll(line => line.Equals(requested));
lines.Add(newname);
File.WriteAllLines(path, lines);

Categories