Returned value not escaped

Returned value not escaped - c#

I am trying to use the library HL7-dotnectcore. But I have a trouble.
In a HL7 message (an ADT-A28), in the segment PID field 5 components 1 sub-component 1 (because the surname has sub-components) I have a string like that
AAA\F\aa
The surname should be (once the escape sequence "\F" is converted) "AAA|aa" or someting similar.
Now, when I call the function
message.Segments("PID")[0].Fields(5).Components(1).SubComponents(1).Value
the value returned has the sequences escaped.
In fact, I can see the correct surname (AAA|aa)
But, if I remove the sub-components form surname and I call the function
message.Segments("PID")[0].Fields(5).Components(1).Value
the value returned has not the sequences escaped.
In fact, I can see the wrong surname (AAA\F\aa).
Why have I this behavior?
How can I solve it?
UPDATE
For getting some help, I put here the message HL7 that I am trying to read with this library.
Probably, is a my wrong syntax the problem.
MSH|^~\&|SAP|aaa|JCAPS||20210330150502||ADT^A28|0000000111300053|P|2.5||||||UNICODE UTF-8
EVN||20210330150502
PID|||704251200^^^SAP^PI^0001~XXXXXXX^^^SS^SS^066^20210330~""^^^^PRC~""^^^^DL~""^^^^PPN~XXXXXXXX^^^Ministero finanze^NN~""^^^^PNT^^^""~""^^^^NPI^^""^^""&&""^""&""||TEST\F\TEST^TEST2^^^SIG.^""||19610926|M|||^^SANTEUSANIO FORCONESE^^^IT^BDL^^066090~&VIA DELLA PIEGA 12 TRALLaae^""^SANTEUSANIO FORCONESE^AQ^67020^IT^L^^066090^^^^20210330||^ORN^^^^^^^^^^349 6927621~^NET^^""|||2||||||||||IT^^100^Italiana|||""||||20160408
PD1|||AVEZZANO-SULMONA-LAQUILA^^^^^^ASLR^^^130201||||||||""^^HSR^0001^XXXXXXXX^YYYY
NK1|1||SEL|||||""|||07||""^^""
NK1|2|""^""
NK1|3|""^""
NK1|4||SEL|||||19900101||||||||||||||P^Consenso Rilasciato|||||||||||OSR-DSDP^^^^^^19900101
NK1|5||SEL|||||20160408||||||||||||||P^Consenso Rilasciato|||||||||||OSR-TD^^^^^^20160408
NK1|6||SEL|||||20210325||||||||||||||P^Consenso Rilasciato|||||||||||OSR-MI^^^^^^20210325
NK1|7||SEL|||||20210325||||||||||||||P^Consenso Rilasciato|||||||||||OSR-DC^^^^^^20210325
NK1|8||SEL|||||20210325||||||||||||||P^Consenso Rilasciato|||||||||||OSR-RD^^^^^^20210325
PV1||N||||||||||||||N
PV2||||||||||||||||||||||||ATTIVO

Related

How do you delete text surrounding a string that you want?

I've looked online for this but not been able to find an answer unfortunately (sorry if there is something I have missed).
I have some code which filters out a specific string (which can change depending on what is read from the serial port). I want to be able to delete all of the characters which I am not using.
e.g. the string I want from the text below is "ThisIsTheStringIWant"
efefhokiehfdThisIsTheStringIWantcbunlokew
Now, I already have a function with some code which will identify this and print it to where I want. However, as the comms could be coming in from multiple ports at any frequency, before printing the string to where I want it, I need to have a piece of code which will recognise everything I don't want and delete it from my buffer.
e.g. Using the same random text above, I want to get rid of the two random strings at the ends (which are before and after "ThisIsTheStringIWant" in the middle).
efefhokiehfdThisIsTheStringIWantcbunlokew
I have tried using the highest voted answer from this question, however I can't find a way to delete the unwanted text before my wanted string. Remove characters after specific character in string, then remove substring?
If anyone can help, that would be great!
Thanks!
Edit:
Sorry, I should have probably made my question clearer.
Any possible number of characters could be before and/or after the actual string I want, and as the string I want is coming from a serial port it will be different every time depending on what comms are coming in from the serial port. On my application I have a cell in a DGV called "Extract" and by typing in the first bit of the comms I am expecting (in this case, the extract would be This). But that will be different depending on what I am doing.

Find the position of the string you want, delete from the beginning to the predecessor of that position, then delete everything from the length of your string to the end.
String: efefhokiehfdThisIsTheStringIWantcbunlokew
Step 1 - "ThisIsTheStringIWant" starts at position 13, so delete the first twelve, leaving...
String: ThisIsTheStringIWantcbunlokew
Step 2 - "ThisIsTheStringIWant" is 20 characters long, so delete from character 21 to the length of the string, leaving:
String: ThisIsTheStringIWant

Alternative for regex.unescape in C# for quotes (")

So i am facing a problem here which i am sure has a simple answer but i cannot seem to find it.
I am comparing string data from 2 tables using C# code
When the data is null or empty in both tables, i want the comparison to return "True" which basically means they are identical.
I am using string.IsNullorEmpty for checking null or empty conditions.
The problem is in one table, the string value is "" while the other table has the same value escaped and is appearing as "\"\""
I assumed using regex.unescape will solve this but it does not seem to be working and i am getting an output that both the values are different causing problems.
One solution i figured out is directly checking if str == "\"\"" for solving the problem.
But are there any cleaner options?

I think you are mixing things here.
If your strings come from the same data source, then either all of them are escaped, or they are not (and if that's not the case, you have bigger problems than what you are stating).
So, if they are not escaped, and one of them contains "", and the other one contains \"\", then they are not equal, one is 2 characters in length, and the other one is 4.
So I'm assuming that they are escaped and your first string is actually empty in the database (it doesn't contain any characters), and the second one is \"\".
You can then use Regex.Unescape (if they are always escaped), but those two strings are not the same: one is empty, and the other one contains (once unescaped), "", so the first string contains no characters and the second one has two of them: no wonder they won't be compared equal.
Now, iff they are indeed escaped, it does not make sense that one contains "", because those characters should be escaped. And if this is not the case, then you have a very specific problem which is not what you asked for: you need to determine whether your string comes escaped or not from the data source... and that's basically impossible unless there's a very specific set of rules which determine so.
If the data source contains randomly escaped or not strings, imagine your data source returns a string \"\": how do you determine if the actual content is escaped and it means {'"','"'} (2 characters, each of them being a double quote), or if it isn't, and it's 4 characters, representing {'\','"','\','"'} (one backslash, one double-quote, one backslash and one double-quote)? There's just no way to tell unless you have a specification that determines those rules (or another field saying if the string is escaped or not).
So, back to your question: although you haven't put any code, my guess is that it is just not wrong: either your expectatives are what are wrong (you want \"\" to mean a string is empty, but it doesn't, because it just doesn't mean that), or your data is wrong.
Either way, there's no generic code solution to any of those... there's specific code solutions for specific cases (like the one you are showing), but not a generic one: with the info you gave in your question, it's just impossible
After all this babbling, now for a specific answer, if your table A contains unescaped strings, and your table B contains escaped strings:
stringFromTableA == Regex.Unescape(stringFromTableB)
Should return true if stringFromTableA contains "" and stringFromTableB contains \"\". Check it. Neither of those will be empty, so string.IsNullOrEmpty() will return false
And an update: should you be checking those string values in the Visual Studio debugger, the debugger shows them escaped, so if you are seing "" in one and \"\" in the other, then your first string is empty (and string.IsNullOrEmpty will return true), and your second string contains two double quotes: string.IsNullOrEmpty will return false, since it is not actually null or empty. And Regex.Unescape will do nothing on this case, since your string doesn't contain any \ and there's nothing to escape, it's just the debugger showing those \'s.

How to carriage return C# string without \r\n?

This is my problem.
A user can enter text into a text area in the browser. Which is then emailed out to users.
What I want to know is that how do I handle carriage return? If I enter \r\n for carriage return, the email (which is a plain text email) has actual \r\n in it.
In other words:
On the SQL server end
Case 1:
if I do this before the email gets sent
(notice the line break after line 1)
update emails
set
body='line 1
line 2'
where
id=100
the email goes out correctly
Case 2:
update emails
set
body='line 1'+char(13) + char(10) +'line 2'
where
id=100
This email also goes out correctly
Case 3:
However if I do this
update emails
set
body='line 1 \r\n line 2',
where
id=100
the email would have the actual text \r\n in it.
How do I simulate case 1/2 through c# ?

SQL literals (at least those in SQL Server) do not support such escape sequences (although you can just hit enter within the string literal so that it spans multiple lines). See this answer for some alternatives if writing it as an SQL string is a requirement.
If running the SQL programmatically from C#, use parameters which will handle this just fine:
sqlCommand.CommandText = "update emails set body=#body where id=#id"
sqlCommand.Parameters.AddWithValue("#body", "line 1 \r\n line2");
Note that the handling of the string literal (and conversion of the \r and \n character escape sequences) happens in C# and the value (with CR and LF characters) is passed to SQL.
If the above didn't address the problem, keep reading.
4.10.13 The textarea element:
For historical reasons, the element's value is normalised in three different ways for three different purposes. The raw value is the value as it was originally set. It is not normalized. The API value is the value used in the value IDL attribute. It is normalized so that line breaks use "LF" (U+000A) characters. Finally, there is the form submission value. [Upon form submission the textarea] is normalized so that line breaks use U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pairs, and in addition, if necessary given the element's wrap attribute, additional line breaks are inserted to wrap the text at the given width.
Note that CR and LF represent characters and not the two-character sequence of \ followed by either the r or n characters - this form is often found in string literals. If it appears as such then something is doing the incorrect conversion and putting (or leaving) the \ there. Or, perhaps there is some misguided "add slashes" hack somewhere?
As pointed out, while URL decode is likely wrong, it won't directly do this conversion. However, if the conversion happened previously before being "URL Encoded", then it will (correctly) decode to (incorrect) values.
In either case, it's a bug. So find out where the incorrect data conversion is introduced and fix it (attach a debugger and/or monitor the network traffic for clues) - the required information to isolate where is simply not present in the post.

Use whatever c#'s string replace method is to replace "\\r\\n" with "\r\n" and that should fix it.

What is the best character for String.Split?

Disclaimer: I KNOW that in 99% of cases you shouldn't "serialize" data in a concatenated string.
What char you guys use in well-known situation:
string str = userId +"-"+ userName;
In majority of cases I have fallen back to | (pipe) but, in some cases users type even that. What about "non-typable" characters like ☼ (ALT+9999)?

That depends on too many factors to give a concrete answer.
Firstly, why are you doing this? If you feel the need to store the userId and userName by combining them in this fashion, consider alternative approaches, e.g. CSV-style quoting or similar.
Secondly, under normal circumstances only delimiters that aren't part of the strings should be used. If userId is just a number then "-" is fine... but what if the number could be negative?
Third, it depends on what you plan to do with the string. If it is simply for logging or debugger or some other form of human consumption then you can relax a bit about it, and just choose a delimiter that looks appropriate. If you plan to store data like this, use a delimiter than ensures you can extract the data properly later on, regardless of the values of userId or userName. If you can get away with it, use \0 for example. If either value comes from an untrusted source (i.e. the Internet), then make sure the delimiter can't be used as a character in either string. Generally you would limit the characters that each contains - say, digits for userId and letters, digits and SOME punctuation characters for userName.

If it's for data storage and retrieval, there is no way to guarantee that a user won't find a way to inject your delimiter into the string. The safe thing to do is pre-process the input somehow:
Let - be the special character
If a - is encountered in the input, replace it with something like -0.
Use -- as your delimiter
So userid = "alpha-dog" and userName = "papa--0bear" will be translated to
alpha-0dog--papa-0-00bear
The important thing is that your scheme needs to be perfectly undoable, and that the user shouldn't be able to break it, no matter what they enter.
Essentially this is a very primitive version of sanitization.

Newlines escaped unexpectedly in C#/ASP.NET 1.1 code

Can someone explain to me why my code:
string messageBody = "abc\n" + stringFromDatabaseProcedure;
where valueFromDatabaseProcedure is not a value from the SQL database entered as
'line1\nline2'
results in the string:
"abc\nline1\\nline2"
This has resulted in me scratching my head somewhat.
I am using ASP.NET 1.1.
To clarify,
I am creating string that I need to go into the body of an email on form submit.
I mention ASP.NET 1.1 as I do not get the same result using .NET 2.0 in a console app.
All I am doing is adding the strings and when I view the messageBody string I can see that it has escaped the value string.
Update
What is not helping me at all is that Outlook is not showing the \n in a text email correctly (unless you reply of forward it).
An online mail viewer (even the Exchange webmail) shows \n as a new line as it should.

I just did a quick test on a test NorthwindDb and put in some junk data with a \n in middle. I then queried the data back using straight up ADO.NET and what do you know, it does in fact escape the backslash for you automatically. It has nothing to do with the n it just sees the backslash and escapes it for you. In fact, I also put this into the db: foo"bar and when it came back in C# it was foo\"bar, it escaped that for me as well. My point is, it's trying to preserve the data as is on the SQL side, so it's escaping what it thinks it needs to escape. I haven't found a setting yet to turn that off, but if I do I'll let you know...

ASP.NET would use <br /> to make linebreaks. \n would work with Console Applications or Windows Forms applications. Are you outputting it to a webpage?
Method #1
string value = "line1<br />line2";
string messageBody = "abc<br />" + value;
If that doesn't work, try:
string value = "line1<br>line2";
string messageBody = "abc<br>" + value;
Method #2
Use System.Environment.NewLine:
string value = "line1"+ System.Environment.NewLine + "line2";
string messageBody = "abc" System.Environment.NewLine + value;
One of these ways is guaranteed to work. If you're outputting a string to a Webpage (or an email, or a form submit), you'd have to use one of the ways I mentioned. The \n will never work there.

You need to set a watch and see where exactly your database result string gets double escaped.
Adding two strings together will never double escape strings, so its either happening before that, or after that.

When I get the string out of the database, .NET escapes it automagically. However, the little # symbol is appended to the string, which I did not notice.
So it appeared to be non-escaped to my "about to go on holiday" eye inside the ide.
Therefore when the non-escaped \n was added to the string (as the whole string is no longer escaped), it would remove the # and show the database portion of the string escaped.
Gah, it was all an illusion.
Perhaps that holiday is overdue.
Thanks for your input.

If the actual string stored in the database is (spaces added for emphasis): "l i n e 1 \ n l i n e 2", then whatever stored it there probably has a bug. But assuming that is the exact string there, then the "abc\nline1\nline2" string is what happens when you look at the string which would print as "abcline1\nline2" in a debugger which escapes it (this is a convenience, allowing you to copy-paste out of the debugger straight into code without errors).
Short answer: .NET is not escaping the string, your debugger is. The code which writes a literal "\n" into the database has a bug.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.