Regex for Pipe Delimted with quoted Identifiers [duplicate] - c#

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Parsing CSV files in C#
I have a C# application that parses a pipe delimited file. It uses the Regex.Split method:
Regex.Split(line, #"(?<!(?<!\\)*\\)\|")
However recently a data file came across with a pipe included in one of the data fields. The data field in question used quoted identifers so when you open in Excel it opens correctly.
For example I have a file that looks like:
Field1|Field2|"Field 3 has a | inside the quotes"|Field4
When I use the above regex it parses to:
Field1
Field2
Field 3 has a
inside the quotes
Field4
when I would like
Field1
Field2
Field 3 has a | inside the quotes
Field4
I've done a fair amount of research and can't seem to get the Regex.Split to split the file on pipes but respect the quoted identifiers. Any help is greatly appreciated!

Here is a quick expression I've thrown together than seems to do the trick:
"([^"]+)"|([^\|]+)
Though your expression seems to be doing something with \'s as well, so you might need to add to this expression any other needs you have. I've ignored them in my answer because they were not explained in the question and therefore I cannot provide a solution without knowing why they are there - they may in fact not need to be there at all.
Also, my expression ignores empty fields though (i.e. 1||2|3 would come out as 1, 2 and 3 only) and I don't know whether this is what you need, if it isn't let me know and I can change the expression to something that would cater for that too.
Hope this helps anyway.

Related

What is #property_name declaration means in C#? [duplicate]

This question already has answers here:
What's the use/meaning of the # character in variable names in C#?
(9 answers)
Closed 1 year ago.
I was browsing sources for microsoft .netcore runtime and came across these lines of code
as you can see they are using # symbol infront of every error message getter like #Error_InvalidFilePath.
My question is, what is this language feature that is being used here?
And, Where can I read more about it?
Thanks
The # is a way to use reserved words as names. E.g. the variable class could be used as variable name like #class.
For non reserved names this won't add anything. But of course you don't know which names are reserved in the future. Your code example is generated code, which should preferably work for newer language versions and so the # makes sense there.
See docs

How to split concatenated JSON files using C# [duplicate]

This question already has an answer here:
What is the correct way to use JSON.NET to parse stream of JSON objects?
(1 answer)
Closed 4 years ago.
I've got to process files that are full of JSON objects. These have simply been concatenated together with no separator thus making the whole file invalid JSON. What is the best way to split this up again? I need to ensure that I don't end up splitting in encoded strings and it needs to be fairly fast as the file can be quite big.
Example file:
{"property":"Data which may include}{"}{"property":"A second object"}
I've done a lot of parsing like this. There's so much JSON code out there that it's rarely necessary with JSON. But if you really need to pass this code yourself in C#, I see no way to approach this other than by manually parsing it character by character.
Special attention needs to be given to curly braces and colons. And, when parsing tokens you'll need to determine if it's quoted. If it's quoted, then you go until the closing quote (ignore any escaped quotes). If it's not quoted, then you go until you hit a non-symbol character.
You might find this task a little easier using my Text Parsing Helper Class class to handle some of the lower-level string handling of your parser.

C# How to check if a string is misspelled [duplicate]

This question already has answers here:
Comparing list of strings with an available dictionary/thesaurus
(2 answers)
Closed 7 years ago.
I'm using C# to write a program that generates lines of text over and over. The user enters a set of numbers, 1-26, in whatever order, and the program matches each number to a letter.
The point is to have it go through every order of the alphabet until it eventually generates an actual word. For example, someone could enter 7-2-15-26-3, and it would eventually read that set of numbers as "hello".
I got the program to work and to print every outcome to a txt file, but because there are so many different possible outcomes, it is almost impossible to find an actual word in the file without going through every single line.
One of my tests only had 11 letters to choose from, it took a few minutes to finish and the txt file was so big, it would not open.
So my question is, does anyone know of a library or spell check that I could use to check if each string is an actual word? If I could check it each time, I could have it only print the outcomes that are words. I would have it check against preset words, but I won't always know what the outcome will be so I need to check against everything.
I have searched online but haven't found much. Again, I'm using C#. Thank you for any help.
Edit: Sorry about asking a question that had already been answered, I didn't see the other question before. I'll try the NHunspell and see how that works.
Try Nhunspell, it's free (.Net version of popular "Hunspell")
E.g.
Check Spelling,
bool correct = hunspell.Spell("Recommendation");
Get suggestions,
List<string> suggestions = hunspell.Suggest("Recommendatio");
More c# code samples
I suggest that you incorporate an english dictionary into your application so that you have something to check against.
Every time a new word is generated, it checks through the dictionary and takes all the matching results through regex and returns null if no word matches.
Hope this helps.

How to replace signs in quotes in a string [duplicate]

This question already has answers here:
Parsing CSV files in C#, with header
(19 answers)
Closed 7 years ago.
i have a semicolon separates string, that contains values of every type. string and date values are in quotations.
Now i have an evil string, where an inner string contains s semicolon, that i need to remove (replace by nothing).
eg:
"Value1";0;"Value2";4711;"Evil; Value";"2015-09-03"
in C#:
string value = "\"Value1\";0;\"Value2\";4711;\"Evil; Value\";\"2015-09-03\""
So how to replace all semicolons, that are in quotations? can anybody help?
Regex is awful at handling delimited strings. It can do it, but it's not often as good of a choice as it first appears. This is one of several reasons why.
Instead, you should use a dedicated delimited string parser. There are (at least) three built into the .Net framework. The TextFieldParser type is one of those, and it will handle this correctly.
You should try this i.e to match only those semicolons which is not preceded by : :
(?<=[^"]);
Here is demo

Regex for matching US phone number with or without area code [duplicate]

This question already has answers here:
How to validate phone numbers using regex
(43 answers)
Closed 9 years ago.
My question was marked as a duplicate so I've made a couple edits. As I said, I was able to find many similar questions when I searched but none were quite what I needed. I am not validating a string where the only thing present will be the phone number (this seems to be what most of the other questions are addressing). Rather, I am attempting to pull out all phone numbers (which will then be manually checked by the user) from a larger block of text. The problem I am having is that my regular expression is matching zip codes with extensions (ex: 45202-4787), and I am not sure how to alter my regex to avoid that. If this truly is a duplicate question then I apologize for not being able to find the existing one that deals with my issue.
My specifications for phone number format are:
1) -, ., and space as delimiters (and in any combination)
2) area code may appear with or without parentheses
A few examples:
(xxx) xxx-xxxx
(xxx) xxx.xxxx
xxx-xxx-xxxx
xxx xxx-xxxx
xxxxxxxxxxx
I am using Anirudh's regex from the comments:
(\(?\d{3}\)?)?[. -]?\d{3}[. -]?\d{4}
Again, my problem is that this regex matches zip codes with extensions (ex: 45202-4787).
I would be grateful for any help, as I'm very new to using regular expressions. Thanks!
This should do it:
^(\([0-9]{3}\)|[0-9]{3})[ -\.]?[0-9]{3}[ -\.]?[0-9]{4}$

Categories