How can I delete all the comments in my code? - c#

I'm new to C#, and want to develope a program with which I could delete the comments after // in my code. Is there any simple code recommended for this purpose?

It has been suggested that you just search for "//" and trim.
Because you have limited yourself to single-line commands this seems like a relatively simple exercise however it has some tricky cases you need to be thinking about if you intend for the output of the program to be a valid C# application with identical behavior to the input program.
Here are some examples where just searching for "//" and trimming won't work.
Comment in Literal:
string foo = "this is // not a comment";
Comment in Comment
/* you should not trim // this one */
Comment in Comment Part Deux
// This is a comment // so don't just remove this!
Multi-line Comment Adjacency
/* you should not *//* trim this these */
There are certainly other edge cases but these are some low-hanging fruit to think about.

First point, this seems like a bad idea. Comments are useful.
Taking it as an exercise,
Edit: This is a simple solution that will fail on all the case #Bubbafat mentions (and propbably some more). It would still work OK on most source files.
read the text one line at a time.
find the last occurrence of //, if any using String.LastIndexOf()
remove the text after (including) the '//' when found
write the line to the output
ad 1: You can open an TextReader using System.IO.File.OpenText(), or File.ReadLines() if you can use Fx4
Also open an output file using System.IO.File.WriteText()
ad 3: int pos = line.LastIndexOf("//"); if (pos >= 0) { line = line.Substring(0, pos); }

Related

Regex hangs trying to find match

I am trying to match an assignment string in VB code (as in I'm passing in text that is VB code into my program that's written in C#). The assignment string that I'm trying to match is something for example like
CustomClassInitializer(someParameter, anotherParameter, someOtherClassAsParameterWithInitialization()).SomeProperty = 7
and I realize that's rather complex, but it actually isn't far off from some of the real text I'm trying to match.
In order to do so I wrote a Regex. This Regex:
#"[\w,.]+\(([\w,.]*\(*,* *\)*)+ = "
which correctly matches. The problem is it becomes VERY slow (with timeouts), which I've researched and found is probably because of "backtracking". One of the suggested solutions to help with backtracking in general was to add "?>" to the regex, which I think would go in this position:
[\w,.]+\(?>([\w,.]*\(*,* *\)*)+ =
but this no longer matches properly.
I'm fairly new to Regex, so I imagine that there is a much better pattern. What is it please? Or how can I improve my times in general?
Helpful notes:
I'm only interested in position 0 of the string I'm searching for a
match in. My code is "if (isMatch && match.index == 0) { ... }. Can
I tell it to only check position 0 and if it's not a match move on?
The reason I use all the 0 or more things is the match could be as simple as CustomClass() = new CustomClass(), and as complicated as the above or perhaps a bit worse. I'm trying to get as many cases as possible.
This Regex is interested in "[\w,.]+(" and then "whatever may be inside the parentheses" (I tried to think of what all could be inside them based on the fact that it's valid VB code) until you get to the close parenthesis and then " = ". Perhaps I can use a wildcard for literally anything until it get's to ") = " in the string? - Like I said, fairly new to Regex.
Thanks in advance!
This seems to do what you want. Normally, I like to be more specific than .*, but it is working correctly. Note that I am using the Multi-line option.
^.*=\s*.+$
Here is a working example in RegExStorm.net example

Replacing a word in a text file

I'm doing a little program where the data saved on some users are stored in a text file. I'm using Sytem.IO with the Streamwriter to write new information to my text file.
The text in the file is formatted like so :
name1, 1000, 387
name2, 2500, 144
... and so on. I'm using infos = line.Split(',') to return the different values into an array that is more useful for searching purposes. What I'm doing is using a While loop to search for the correct line (where the name match) and I return the number of points by using infos[1].
I'd like to modify this infos[1] value and set it to something else. I'm trying to find a way to replace a word in C# but I can't find a good way to do it. From what I've read there is no way to replace a single word, you have to rewrite the complete file.
Is there a way to delete a line completely, so that I could rewrite it at the end of the text file and not have to worried about it being duplicated?
I tried using the Replace keyword, but it didn't work. I'm a bit lost by looking at the answers proposed for similar problems, so I would really appreciate if someone could explain me what my options are.
If I understand you correctly, you can use File.ReadLines method and LINQ to accomplish this.First, get the line you want:
var line = File.ReadLines("path")
.FirstOrDefault(x => x.StartsWith("name1 or whatever"));
if(line != null)
{
/* change the line */
}
Then write the new line to your file excluding the old line:
var lines = File.ReadLines("path")
.Where(x => !x.StartsWith("name1 or whatever"));
var newLines = lines.Concat(new [] { line });
File.WriteAllLines("path", newLines);
The concept you are looking for is called 'RandomAccess' for file reading/writing. Most of the easy-to-use I/O methods in C# are 'SequentialAccess', meaning you read a chunk or a line and move forward to the next.
However, what you want to do is possible, but you need to read some tutorials on file streams. Here is a related SO question. .NET C# - Random access in text files - no easy way?
You are probably either reading the whole file, or reading it line-for-line as part of your search. If your fields are fixed length, you can read a fixed number of bytes, keep track of the Stream.Position as you read, know how many characters you are going to read and need to replace, and then open the file for writing, move to that exact position in the stream, and write the new value.
It's a bit complex if you are new to streams. If your file is not huge, copying a file line for line can be done pretty efficiently by the System.IO library if coded correctly, so you might just follow your second suggestion which is read the file line-for-line, write it to a new Stream (memory, temp file, whatever), replace the line in question when you get to that value, and when done, replace the original.
It is most likely you are new to C# and don't realize the strings are immutable (a fancy way of saying you can't change them). You can only get new strings from modifying the old:
String MyString = "abc 123 xyz";
MyString.Replace("123", "999"); // does not work
MyString = MyString.Replace("123", "999"); // works
[Edit:]
If I understand your follow-up question, you could do this:
infos[1] = infos[1].Replace("1000", "1500");

Parsing a CSV file

We have an integration with another system that relies on passing CSV files back and forth (really old school).
The structure is generally:
ID, Name, PhoneNumber, comments, fathersname
1, tom, 555-1234, just some random text, bill
2, jill smith, 555-4234, other random text, richard
Every so often we see this:
3, jacked up, 999-1231, here
be dragons
amongst us, ted
The primary problem I care about is detecting that a line breaker (\n) occurs in the middle of the record when that is the record terminator.
Is there anyway I can preprocess this to reliably fix it?
Note that we have zero control over what the other system emits.
So you should be able to do something more or less like this:
for (int i = 0; i < lines.Count; i++)
{
var fields = lines[i].Split(',').ToList();
while (fields.Count < numFields)//here be dragons amonst us
{
i++;//include next line in this line
//check to make sure we haven't run out of lines.
//combine end of previous field with start of the next one,
//and add the line break back in.
var innerFields = lines[i].Split(',');
fields[fields.Count - 1] += "\n" + innerFields[0];
fields.AddRange(innerFields.Skip(1));
}
//we now know we have a "real" full line
processFields(fields);
}
(For simplicity I assumed all lines were read in at the start; I assume you could alter it to lazily fetch each line easily enough.)
Let me start and say that the CSV file in your example is invalid. If a line break occurs inside a string, it should be wrapped with double quote characters.
Now for the answer - In order to parse this invalid csv format you must do several assumptions. In this case I made 2 assumptions: 1) The ID column must be numeric 2) The comment field can not contain digits.
Based on these assumptions you can check the first character after the line break character. If it is digit, you assume its a new record. If not you should treat it as a continue value of the comment field.
I don't know if the second assumption is valid, if not, you can enhance the logic so it will cover the business rules of the system.
Good Luck!
Firstly I would recommend using a tool to manage reading and writing your csv files, I use the FileHelpers library which is great.
You can essentially type your records and it will do all the validation and such for you. Worth the effort.
To your question perhaps you can do some preprocessing on the file and use Regex to replace any line breaks with a space?
I do something similar (not with files but) try
line.Replace(Environment.NewLine, " ");
With FileHelpers you could write a custom converter to do this during processing, or hook into the BeforeRead event.

Deleting line breaks in a text file with a condition

I have a text file. Some of the lines in it end with lf and some end with crlf. I only need to delete lfs and leave all crlfs.
Basically, my file looks like this
Mary had a lf
dog.crlf
She liked her lf
dog very much. crlf
I need it to be
Mary had a dog.crlf
She liked her dog very much.crlf
Now, I tried just deleting all lfs unconditionally, but then I couldn't figure out how to write it back into the text file. If I use File.WriteAllLines and put a string array into it, it automatically creates line breaks all over again. If I use File.WriteAllText, it just forms one single line.
So the question is - how do I take a text file like the first and make it look like the second? Thank you very much for your time.
BTW, I checked similar questions, but still have trouble figuring it out.
Use regex with a negative look-behind and only replace the \n not preceded by a \r:
DEMO
var result = Regex.Replace(sampleFileContent, #"(?<!\r)\n", String.Empty);
The (?<! ... ) is a negative look-behind. It means that we only want to replace instances of \n when there isn't a \r directly behind it.
Disclaimer: This may or may not be as viable an option depending on the size of your file(s). This is a good solution if you're not concerned with overhead or you're doing some quick fixes, but I'd look in to a more robust parser if the files are going to be huge.
This is an alternative to Brad Christie's answer, which doesn't use Regex.
String result = sampleFileContent.Replace("\r\n", "**newline**")
.Replace("\n","")
.Replace("**newline**","\r\n");
Here's a demo. Seems faster than the regex solution according to this site, but uses a bit more memory.
Just tested it:
string file = File.ReadAllText("test.txt");
file = file.Replace("\r", "");
File.WriteAllText("test_replaced.txt", file);

Design strategy for a simple code parser

I'm attempting to write an application to extract properties and code from proprietary IDE design files. The file format looks something like this:
HEADING
{
SUBHEADING1
{
PropName1 = PropVal1;
PropName2 = PropVal2;
}
SUBHEADING2
{
{ 1 ; PropVal1 ; PropValue2 }
{ 2 ; PropVal1 ; PropValue2 ; OnEvent1=BEGIN
MESSAGE('Hello, World!');
{ block comments are between braces }
//inline comments are after double-slashes
END;
PropVal3 }
{ 1 ; PropVal1 ; PropVal2; PropVal3 }
}
}
What I am trying to do is extract the contents under the subheading blocks. In the case of SUBHEADING2, I would also separate each token as delimited by the semicolons. I had reasonably good success with just counting the brackets and keeping track of what subheading I'm currently under. The main issue I encountered involves dealing with the code comments.
This language happens to use {} for block comments, which interferes with the brackets in the file format. To make it even more interesting, it also needs to take into account double-slash inline comments and ignore everything up to the end of the line.
What is the best approach to tackling this? I looked at some of the compiler libraries discussed in another article (ANTLR, Doxygen, etc.) but they seem like overkill for solving this specific parsing issue.
I'd suggest writing a tokenizer and parser; this will give you more flexibility. The tokenizer basically does a simple text-wise breakdown of the sourcecode and puts it into more usable data structure; the parser figures out what to do with it, often leveraging recursion.
Terms to google: tokenizer, parser, compiler design, grammars
Math expression evaluator: http://www.codeproject.com/KB/vb/math_expression_evaluator.aspx
(you might be able to take an example like this and hack it apart into what you want)
More info about parsing: http://www.codeproject.com/KB/recipes/TinyPG.aspx
You won't have to go nearly as far as those articles go, but, you're going to want to study a bit on this one first.
You should be able to put something together in a few hours, using regular expressions in combination with some code that uses the results.
Something like this should work:
- Initialize the process by loading the file into a string.
Pull each top-level block from the string, using regex tags to separately identify the block keyword and contents.
If a block is found,
Make a decision based on the keyword
Pass the content to this process recursively.
Following this, you would process HEADING, then the first SUBHEADING, then the second SUBHEADING, then each sub-block. For the sub-block containing the block comment, you would presumably know based on the block's lack of a keyword that any sub-block is a comment, so there is no need to process the sub-blocks.
No matter which solution you will choose, I'm pretty sure the best way is to have 2 parsers/tokenizers. One for the main file structure with {} as grouping characters, and one for the code blocks.

Categories