Unable to remove a special character from string [duplicate]

Unable to remove a special character from string [duplicate] - c#

This question already has answers here:
Why do we always prefer using parameters in SQL statements?
(7 answers)
Closed 5 years ago.
Background
There is an application where users are required to enter information that will be stored in a DB. I then have an application that runs every 5 minute and gets the information that was entered by the user using the previous application. My app then grabs all the information from the database and then proceed to do create the given document and then places it in a server for the user to get. However users started having issues with a specific document, where certain functionalities were not executing correctly. So I identified the issue as being the string which a user entered in the entry application, in the title column they had "Jame's Bond Story" so my application creates the document and does not have any issue what so ever. So after debugging I identified the following problem.
Problem
Not sure how the specific user did what he did but the single quote ' was not really a single quote but some other type of weird character anomaly. I proved this by running the following code to see if I can remove it.
string cleanTitle = BookRec.TitleName.Replace("'","");
However this did not work for me at all. I then broke the string into a character array and instead of getting the character I got a weird digit. So then I proceeded into using this regex code to clean every character and only allow numbers and letters.
string cleanTitle = Regex.Replace(BookRec.TitleName, "[^\\w\\. _]", "");
This has now become an issue because the users want the Title to contain special the following characters ( ) _ , - .
I am looking for a way to to filter out any characters including the type I ran into this week and only allow the 6 characters which the users have agreed to. I can up with the following regex formula bu I am getting an empty string.
Regex fomrula = new Regex(#"^[a-zA-Z0-9_\[])(,\-.'");
However I am getting an empty string when I am replacing the title. I am not a big fan of regex, I am also open to a a sub string approach to this as well.
Appended Information
I am not able to access the application that inserts the information to the given database. I am only able to read from the database and then preform actions.

You may want to try something like this:
string cleanTitle = Regex.Replace(BookRec.TitleName, #"[^\u0000-\u007F]+", "");
This will replace any Unicode character that is not between those values. I'm not sure if those are the ones that are causing you problems but hopefully it may give you a hint in the right direction.

Related

How do you delete text surrounding a string that you want?

I've looked online for this but not been able to find an answer unfortunately (sorry if there is something I have missed).
I have some code which filters out a specific string (which can change depending on what is read from the serial port). I want to be able to delete all of the characters which I am not using.
e.g. the string I want from the text below is "ThisIsTheStringIWant"
efefhokiehfdThisIsTheStringIWantcbunlokew
Now, I already have a function with some code which will identify this and print it to where I want. However, as the comms could be coming in from multiple ports at any frequency, before printing the string to where I want it, I need to have a piece of code which will recognise everything I don't want and delete it from my buffer.
e.g. Using the same random text above, I want to get rid of the two random strings at the ends (which are before and after "ThisIsTheStringIWant" in the middle).
efefhokiehfdThisIsTheStringIWantcbunlokew
I have tried using the highest voted answer from this question, however I can't find a way to delete the unwanted text before my wanted string. Remove characters after specific character in string, then remove substring?
If anyone can help, that would be great!
Thanks!
Edit:
Sorry, I should have probably made my question clearer.
Any possible number of characters could be before and/or after the actual string I want, and as the string I want is coming from a serial port it will be different every time depending on what comms are coming in from the serial port. On my application I have a cell in a DGV called "Extract" and by typing in the first bit of the comms I am expecting (in this case, the extract would be This). But that will be different depending on what I am doing.

Find the position of the string you want, delete from the beginning to the predecessor of that position, then delete everything from the length of your string to the end.
String: efefhokiehfdThisIsTheStringIWantcbunlokew
Step 1 - "ThisIsTheStringIWant" starts at position 13, so delete the first twelve, leaving...
String: ThisIsTheStringIWantcbunlokew
Step 2 - "ThisIsTheStringIWant" is 20 characters long, so delete from character 21 to the length of the string, leaving:
String: ThisIsTheStringIWant

Is there a better way to check if an entire string was matched? [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 3 years ago.
I'm parsing a text file line by line and for each line I have a special regex. However in one case a pattern is matching two lines. One that is a correct match and another line only partialy because a couple of values are optional.
Invalid match:
BNE1010/1000 HKG1955/2005 7/PLD/CLD/YLD
matches patial string (shouln't match this at all):
BNE1010/1000
Correct match (matches the entire string):
RG878A/21AUG15 GIG/BOG 1/RG/AV 3/AV 4/AV 5/RG 6/AV081C/22 7/CDC/YD 9/TP
The regex for this is quite long and contains several optionl groups:
^(?<FlightDesignator>([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))(?<OperationalSuffix>[A-Z])?(?<FlightIdentifierDate>\/(\d{2})([A-Z]{3})?(\d{2})?)?(\s(?<FlightLegsChangeIdentifier>(\/?[A-Z]{3})+)(?=(\s|$)))?(\s1(?<JointOperationAirlineDesignators>(\/.{2}[A-Z]?)+))?(\s3\/(?<AircraftOwner>([A-Z]{2}|.)))?(\s4\/(?<CockpitCrewEmployer>(.+?)(?=(?: \d\/|$))))?(\s5\/(?<CabinCrewEmployer>([A-Z]{2}|.)))?(?<OnwardFlight>\s6\/(([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))([A-Z])?(\/(\d{2})([A-Z]{3})?(\d{2})?)?)?(\s7\/(?<MealServiceNote>(\/?[A-Z]{0,3})+))?(\s9\/(?<OperatingAirlineDisclosure>(.{2}[A-Z]?)))?
I think there is no need to study the entire regex becasue it's build dynamically from smaller patterns at runtime and all the parts work correctly. Also lots of combinations are tested with unit tests and they all work... as long as I try to parse ony the line that should be matched by the pattern.
Currently I'm checking if the entire string is matched by
match.Group[0].Value == line
but I find it's quite ugly. I know from JavaScript the regex engine provides an Index property where the regex engine stopped. So my idea was to compare the index with the length of the string. Unfortunatelly I wasn't able to find such a property in C#.
Another idea would be to modify the regex so that it matches only one line and no partial lines.
Example: https://regex101.com/r/dM5wU4/1
The example contains only two cases because there aren't actually any combinations that would change its behavior. I could remove some parameters but it wouldn't change anything.
EDIT:
I've edited my question. Sorry to every for not providing all the information at the first time. I won't ask any more questions when writing on the phone :) It wasn't a good idea. Hopefully it won't get closed now.
You asked whether I could simplify the regex. I would do it if I could and knew how. If it was easy I wouldn't have asked. The problem started as the regex ans string became bigger during development. Now they are at the production length and I can't actually make them shorter even for the sake of the quesion, sorry.
EDIT-2:
I found the reason why I couldn't find the inherited Index and Length properties of the Match class.
For some strange reason when selecting the Match class and pressing F1 Visual Studio opened the wrong help page (Match Properties) even though I'm not working with the Micro Framework. I didn't notice that but I was indeed wondering why there is very little information. Thx to #Jamiec for the correct link. I won't trust Visual Studio anymore when hitting F1.

Disclaimer: Im going to add this, but I doubt its the solution. If it's not this part will get deleted in short order
You can add a $ at the end of your regular expression. This stops your first example matching but continues to match the second example.
As you've not provided any more than 2 examples, its unclear if this actually solves all your cases or just that one specific false positive.
My question is whether it is possible to check if a regular expression matched the entire sting without checking the first group against the original line?
If you're not adverse to checking the entire match to the length of the string you can do that too:
var regex = new Regex(#"^(?<FlightDesignator>([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))(?<OperationalSuffix>[A-Z])?(?<FlightIdentifierDate>\/(\d{2})([A-Z]{3})?(\d{2})?)?(\s(?<FlightLegsChangeIdentifier>(\/?[A-Z]{3})+)(?=(\s|$)))?(\s1(?<JointOperationAirlineDesignators>(\/.{2}[A-Z]?)+))?(\s3\/(?<AircraftOwner>([A-Z]{2}|.)))?(\s4\/(?<CockpitCrewEmployer>(.+?)(?=(?: \d\/|$))))?(\s5\/(?<CabinCrewEmployer>([A-Z]{2}|.)))?(?<OnwardFlight>\s6\/(([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))([A-Z])?(\/(\d{2})([A-Z]{3})?(\d{2})?)?)?(\s7\/(?<MealServiceNote>(\/?[A-Z]{0,3})+))?(\s9\/(?<OperatingAirlineDisclosure>(.{2}[A-Z]?)))?");
var input1 = #"BNE1010/1000 HKG1955/2005 7/PLD/CLD/YLD";
var input2 = #"RG878A/21AUG15 GIG/BOG 1/RG/AV 3/AV 4/AV 5/RG 6/AV081C/22 7/CDC/YD 9/TP";
var match1 = regex.Match(input1);
var match2 = regex.Match(input2);
Console.WriteLine(match1.Length == input1.Length); // False
Console.WriteLine(match2.Length == input2.Length); // True
Live example: http://rextester.com/NIBE6349

Reverse RegExp from user entered string ( C#)

Is it possible to generate regular expressions from a user entered string? Are there any C# libraries to do this?
For example a user enters a string e.g. ABCxyz123 and the C# code automatically generates [A-Z]{3}[a-z]{3}\d{3}.
This is a simple string but we could have more complicated strings like
MON-0123/AB/5678-abc 2/7
Or
1234-678/abc::1234ABC?246
I already have a string tokeniser (from a previous stackoverflow question) so I could construct a regex from the list of tokens.
But I was wondering if there is a lib or C# code out there that’ll do it.
Edit: Important, I should of also said: It's not the actual character in the string that are important but the type of character and how many.
e.g A user could enter a "pattern" string of ABCxyz123.
This would be interpreted as
3 upper case alphas followed by
3 lower case alphas followed by
3 digits
So other users (when complied) must enter strings that match that pattern [A-Z]{3}[a-z]{3}\d{3}., e.g. QAZplm789
It's the format of user entered strings that's need to be checked not the actual content if that makes sense

Jerry has a related link
creating a regular expression for a list of strings
There are a few other links off this.
I'm not trying to do anything complicated e.g NLP etc.
I could use C# expression builder and dynamic linq at a push, but that seems overkill and a code maintainable nightmare .
I'll write my own "simple" regex builder from the tokenized string.
Example Use Case:
An admin office user where I work could setup the string patterns for each field by typing a string pattern, My code converts this to a regex, I store these in a database.
E.g: Field one requires 3 digits at the start. If there are 2 digits then send to workflow 1 if 3 then send to workflow 2. I could simply check the number of chars by substr or what ever. But this would be a concrete solution.
I am trying to do this generically for multiple documents with multiple fields. Also, each field could have multiple format checkers.
I don't want to write specific C# checks for every single field in numerous documents.
I'll get on with it, should keep me amused for a couple of days.

String processing / CSV challenge

Having used SQL Server Bulk insert of CSV file with inconsistent quotes (CsvToOtherDelimiter option) as my basis, I discovered a few weirdnesses with the RemoveCSVQuotes part [it chopped the last char from quoted strings that contained a comma!]. So.. rewrote that bit (maybe a mistake?)
One wrinkle is that the client has asked 'what about data like this?'
""17.5179C,""
I assume if I wanted to keep using the CsvToOtherDelimiter solution, I'd have to amend the RegExp...but it's WAY beyond me... what's the best approach?
To clarify: we are using C# to pre-process the file into a pipe-delimited format prior to running a bulk insert using a format file. Speed is pretty vital.

The accepted answer from your link starts with:
You are going to need to preprocess the file, period.
Why not transform your csv to xml? Then you would be able to verify your data against an xsd before storing into a database.

To convert a CSV string into a list of elements, you could write a program that keeps track of state (in quotes or out of quotes) as it processes the string one character at a time, and emits the elements it finds. The rules for quoting in CSV are weird, so you'll want to make sure you have plenty of test data.
The state machine could go like this:
scan until quote (go to 2) or comma (go to 3)
if the next character is a quote, add only one of the two quotes to the field and return to 1. Otherwise, go to 4 (or report an error if the quote isn't the first character in the field).
emit the field, go to 1
scan until quote (go to 5)
if the next character is a quote, add only one of the two quotes to the field and return to 4. Otherwise, emit the field, scan for a comma, and go to 1.
This should correctly scan stuff like:
hello, world, 123, 456
"hello world", 123, 456
"He said ""Hello, world!""", "and I said hi"
""17.5179C,"" (correctly reports an error, since there should be a
separator between the first quoted string "" and the second field
17.5179C).
Another way would be to find some existing library that does it well. Surely, CSV is common enough that such a thing must exist?
edit:
You mention that speed is vital, so I wanted to point out that (so long as the quoted strings aren't allowed to include line returns...) each line may be processed independently in parallel.

I ended up using the csv parser that I don't know we had already (comes as part of our code generation tool) - and noting that ""17.5179C,"" is not valid and will cause errors.

Optimistic RegEx Matching for User Text Entry

I'm working on a text entry application that uses regular expressions to validate user input. The goal is to allow keypresses that fit a certain RegEx while rejecting invalid characters. One issue I've run into is that when a user starts inputting information they may create a string that doesn't yet match the given regex, but could cause a match in the future. These strings get erroneously rejected. Here's an example - given the following regex for inputting date information:
(0?[1-9]|10|11|12)/(0?[1-9]|[12]\\d|30|31)/\\d{2}\\d{2}
A user may begin entering "1/" which could be a valid date, but RegEx.IsMatch() will return false and my code ends up rejecting the string. Is there a way to "optimistically" test strings against a regular expression so that possible or partial matches are allowed?
Bonus: For this RegEx in particular there are some sequences which cause required characters. For example, if the user types "2/15" the only possible valid character they could enter next is "/". Is it possible to detect those scenarios so that the required characters could be automatically entered for the user to ease input?

What you can do is anchor your RegExp (i.e. adding ^ and $, as in start/end of line) and make some component optionnal for validation, but strictly defined if present.
Something looking like this:
^(0?[1-9]|10|11|12)(/((0?[1-9]|[12]\\d|30|31)(/(\\d{2}(\\d{2})?)?)?)?)?$
I do realize it looks horrible but as far as I know there is no way to tell the regexp engine to validate as long as the string satisfies the beginning of the regexp pattern.
In my opinion, the best way to achieve what you want to do is to create separate inputs for day/month/date and check their value when leaving the text field.
It also provides a better visibility and user-experience, as I believe no one likes to be prevented from typing certain characters into a text field with or without noticing them disappear as they type or having slashes inserted automatically and without notice.

Have you ever used and app or form that worked that way, simply refusing to accept any keypress it didn't like? If the answer is Yes, did it blow an electronic raspberry each time you pressed a wrong key?
If you really need to validate the input before the form is submitted, use a passive feedback mechanism like a red border around the textfield that disappears the regex matches the input. Also, make sure there's a Help button or a tooltip nearby to provide constructive feedback.
Of course, the best option would be to use a dedicated control like a date-entry widget. But whatever you do, don't do it in such a a way that it feels like you're playing guessing games with the user.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.