Create Variable from part of file - c#

Hi having trouble working out where to start with creating a variable from a file created by Windows Remote Assistance. I need to extract the port from the text file so i can create an ssh tunnel allowing remote assistance from anywhere.
the port appears after the ip address in 'RCTICKET="65538,1,192.168.9.22:7532,' The colon is the first one in the whole file so I think I need to search for the first ":" and then copy the 4 digits that come after it unless the port is 5 digits ( I think here checking if the 5th character is a comma meaning a 4 digit port of if its a number meaning a 5 digit port )
Any help where to start with this I've been googling for hours just can't think how to put this in a search term.
Below is an example of test.msrcincident the file created by Microsoft Remote Assistance that i need to extract the port from
<?xml version="1.0"?>
<UPLOADINFO TYPE="Escalated"><UPLOADDATA USERNAME="jon" LHTICKET="BDF9C9782B31A1BC276C029A169930ABB4490E2088169FA45A3A095258F5C54D345F4D793363E2C9 B924C5D6A38210AF2E86B3E3D33E5BEB3E35729ECDA88D5F5CE23879899768432726AF419FA2147194F4358BA2A0F245C4307EC8CAB882E2B670977562E5423C90EC336A15BA3DC57496F1EBB26B55B449B45FBD317CD4E422186EA7989F78C6FC3019BCF5831B1E060B174C5254D92448992A543079E576A66617F8B5BEA4C5961FC75C0B67F28B996CD4F1247DBC1C725B9D69B094B53AE24A533501A607CF119ED99C34F0C7210376C6564A48E25871AA32934409D981CF63F60DA956B0877AFBD669DFC321D16D55A34B9949AE0B26B6EEB473915AC416ABFC1129C08021F4011F1F0D1869BB86842C0218C03286C956FC7897B319E0B3A495EBA8ED41835E84E6BAD6B30199F6ACF191B6529DF2C5A264F578AF3B31A84997DA9C4BF1F8AD9E4931F99AE94A0E66D941F050AC0B025523148A95D24E60A6C548341C486BB40089B2088F5FE49AC966D65B728E36E0D7D76C98827335983BEC912DFC0B714DBBBFA060DE62658E7BABDB9BEB45486138950548DA62FDFD6437D0798A67D20CA1911880F58FCDA5F98FA5E0CAEF643171FE9DA8AF046" RCTICKET="65538,1,192.168.9.22:7532,*,U15FphW2EDtpPVdlHmafYLmnO/aVc+YFoFEw30tpjJ+6vJ+LspOTtaqgFoDt3bsp,*,*,P1ooZJPDyfMMTXqlz5hACdwD8F4=" PassStub="TE*0ViGNuB2T6I" RCTICKETENCRYPTED="1" DtStart="1379526042" DtLength="360" L="0"/></UPLOADINFO>
Thank you for reading

Something simple like this would get you the data you need, to some extent:
var reader = XDocument.Load("path to XML file");
var data = reader.Element("UPLOADINFO")
.Element("UPLOADDATA")
.Attribute("RCTICKET")
var values = data.Split(',');
You will need to work with that RCTICKET string to extract the value you need. It would be a bit safer to work with commas, colons, and whatnot in the context of a single attribute instead of the whole file. Caveat: When I generated an incident file, I ended up with multiple IP addresses in the RCTICKET field. I have multiple VPNs and ethernet adapters in my machine. You will have to pick the right one.
You will also want to handle failures if the XML isn't in the format we expect, or if the file is otherwise inaccessible. You can do this with a try/catch and/or checking for nulls.

Related

C# Get Substring from STring using Regex

I have a log from a firewall, which is in what I think is a horrible format, but the information I actually want to extract is relatively consistently delimited. An Example (although I've removed all the specific information for privacy) would be:
<46>Nov7 04:33:25 FirewallDeviceName [Some identifier from the firewall, can contain spaces]: in:[InterfaceName] out:[InterfaceName], connection-state:new src-mac [Mac-ID], proto UDP, [SourceIP]:[SourcePort]->[Dst-IP]:[Dst-Port], len 32
What I want to extract from this is just the Source and Destination IP Addresses and ports, and maybe also the In and Out interfaces, and maybe the protocol.
I thought that the best way to do this would be to use a combination of .SubString(pos,length) and .IndexOf(char) with RegEx to match the bits of the string that I need for each.
For Example:
\s[0-9]+\. would get the part of the string where the source-IP starts.
[0-9]+\, would get the end of the section containing the IP-Addresses
Can Split() this first using "-->" to split the source and destination and then split each of these using ":" to separate the IP Address from the Port.
The bit I don't know is how to use RegEx within either the IndexOf (to get the character position) or within SubString functions, or even if that's possible.
Any help or advice here.
What I'm basically looking to write initially is a parser to parse some text-file logs that I've generated (these were generated from a syslog listener that I wrote for our new firewall, to work out what the output looked like)... ultimately the parser will be built into the listener itself so that the bits I want are logged directly to an SQL Database, but that bit I can do... it's the parser with the Regex that I'm not sure about.
Thanks very much.
Based on the example text given, the RegEx
, (\[SourceIP\]:\[SourcePort\])->(\[Dst-IP\]:\[Dst-Port\]),
will capture the source and destination into $1 and $2. However, I suspect that they are actually numbers with dots and not words within square brackets. Thus a better expression may be
, ([\d.]+:\d+)->([\d.]+:\d+),
This RegEx matches the two parts within proto UDP, 1.2.3.4:567->8.9.0:123, len 32.

Making strings that may include utf-8 characters safe for MySQL in C#.Net

I have a program written in C#.net that downloads our Amazon.com orders and stores them in our local databases.
I ran into an issue where a customer who purchased a product entered a utf8 character (℅) - (\xe2\x84\x85) into an address. This seems like a pretty reasonable thing to do, but my program choked when it ran across this order until I put in the following fix.
//get the Address2 subnode
XmlNode Address2Node = singleOrder.SelectSingleNode("ShippingAddress/AddressLine2");
if (Address2Node != null)
{
GlobalClass.Address2 = Address2Node.InnerXml;
//** c/o Unicode character messed up program.
if (GlobalClass.Address2.Contains("℅"))
{
GlobalClass.Address2 = GlobalClass.Address2.Replace("℅", "c/o");
// Console.WriteLine(GlobalClass.Address2.Substring(0,1));
}
GlobalClass.Address2 = GlobalClass.Address2.Replace("'", "''");
}
else
{
GlobalClass.Address2 = "";
}
Obviously, this will only work in this one field and with this one utf8 character. Without this fix, when I tried to use Mysql to insert it, I received an error message which basically amounted to there being an error in my Mysql statement and by the time that it was sent to MySQL, it was basically saying to INSERT a record with a string like '\xE2\x84\x85..." plus the rest of the string.
Clearly I have no control over what Amazon is going to allow in the shipping address fields, so I need to account for any odd characters that may come through but I have no idea how to do that. I had hoped that just allowing for utf8 in my connection string (charset=utf8;) would fix it, but that didn't do anything - still had the same error. Perhaps my Google skills are lacking, but I can't seem to find a way to allow for any odd character that may come my way and I don't want to have to wait until someone types it to fix the error.
UPDATE:
What about sending "SET NAMES utf8" as a query? This is sort of out of my MySQL knowledge and I don't want to mess anything up, but would this work? And if so, would all programs that I have that use this database need to send that same query?
UPDATE 2: For those who keep asking for the exception error message, it is:
'MySql.Data.MySqlClient.MySqlException' occurred in MySql.Data.dll
Additional information: Incorrect string value: '\xE2\x84\x85 Yo...' for column 'ShipAddressLine2' at row 1
UPDATE 3: From this discussion: SET NAMES utf8 in MySQL? I tried sending "SET NAMES 'cp1250'" and I was surprised to see that this allowed the insert to go through with the ℅ character in there. I gather that perhaps if before I retrieve the info that I send "SET CHARSET 'utf8'" as a query before another MySQL query to retrieve it that perhaps I will get the correct character? I'm encouraged that it went through my program by sending the "SET NAMES 'cp1250'" query, but I want to know what encoding set to use (CP1250 is Eastern European and while we have customers from around the globe, most of our customers are in the United States) and make sure this is sound practice before I go changing all my programs to include this. Anybody?
In case someone else has this issue, I first managed to avoid the error by sending the MySQL Command: SET NAMES 'latin1' to the server before storing data. This allows any of the utf8 characters to be stored without causing a MySQL error (I tested it with several odd characters). This, however, stored the utf8 characters in a cryptic format, so I am going with a better answer below:
In my current solution, I edited the MySQL table and changed the character set for the relevant column that might receive utf8 data. I changed the column's character set to utf8mb4 and the column's collation to utf8mb4_general_ci. This allowed the data to be stored properly so the utf8 characters are correct.
In addition, when setting the connection string, I added charset=utf8mb4;.
string MyConString = "SERVER=*****;" + "DATABASE=******;" + "UID=********;" + "PASSWORD=*********;" + "charset = utf8mb4;" ;
although, as far as I can tell, it save the content to the field the same whether I include the charset= parameter or not.

How do you delete text surrounding a string that you want?

I've looked online for this but not been able to find an answer unfortunately (sorry if there is something I have missed).
I have some code which filters out a specific string (which can change depending on what is read from the serial port). I want to be able to delete all of the characters which I am not using.
e.g. the string I want from the text below is "ThisIsTheStringIWant"
efefhokiehfdThisIsTheStringIWantcbunlokew
Now, I already have a function with some code which will identify this and print it to where I want. However, as the comms could be coming in from multiple ports at any frequency, before printing the string to where I want it, I need to have a piece of code which will recognise everything I don't want and delete it from my buffer.
e.g. Using the same random text above, I want to get rid of the two random strings at the ends (which are before and after "ThisIsTheStringIWant" in the middle).
efefhokiehfdThisIsTheStringIWantcbunlokew
I have tried using the highest voted answer from this question, however I can't find a way to delete the unwanted text before my wanted string. Remove characters after specific character in string, then remove substring?
If anyone can help, that would be great!
Thanks!
Edit:
Sorry, I should have probably made my question clearer.
Any possible number of characters could be before and/or after the actual string I want, and as the string I want is coming from a serial port it will be different every time depending on what comms are coming in from the serial port. On my application I have a cell in a DGV called "Extract" and by typing in the first bit of the comms I am expecting (in this case, the extract would be This). But that will be different depending on what I am doing.
Find the position of the string you want, delete from the beginning to the predecessor of that position, then delete everything from the length of your string to the end.
String: efefhokiehfdThisIsTheStringIWantcbunlokew
Step 1 - "ThisIsTheStringIWant" starts at position 13, so delete the first twelve, leaving...
String: ThisIsTheStringIWantcbunlokew
Step 2 - "ThisIsTheStringIWant" is 20 characters long, so delete from character 21 to the length of the string, leaving:
String: ThisIsTheStringIWant

c#.net regex to remove certain non ascii chars does not work

I'm newbie to .net, I use script task in SSIS. I am trying to load a file to Database that has some characters like below. This looks like a data copied from word where - has turned to –
Sample text:
Correction – Spring Promo 2016
Notepad++ shows:
I used the regex in .net script [^\x00-\x7F] but even though it falls in the range it gets replaced. I do not want these characters be altered. What am I missing here?
If I don't replace I get a truncation error as I believe these characters take more than a bit size.
Edit: I added sample rows. First two rows have problem and last two are okay.
123|NA|0|-.10000|Correction – Spring Promo 2016|.000000|gift|2013-06-29
345|NA|1|-.50000|Correction–Spring Promo 2011|.000000|makr|2012-06-29
117|ER|0|12.000000|EDR - (WR) US STATE|.000000|TEST MARGIN|2016-02-30
232|TV|0|.100000|UFT / MGT v8|.000000|test. second|2006-06-09
After good long weekend :) I am beginning to think that this is due to code page error. The exact error message when loading the flat file is as below.
Error: Data conversion failed. The data conversion for column "NAME" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
This is what I do in my ssis package.
Script task that validates the flat files.
The only validation that affect the contents of the file is to check the number of delimited columns in the file is same as what it should be for that file. I need to read each line (if there is an extra pipe delimiter (user entry), remove that line from the file and log that into custom table).
Using the StreamWriter class, I write all the valid lines to a temp file and rename/move the file at the end.
apologies but I have just noticed that this process changes all such lines above to something like this.
Notepad: Correction � Spring Promo 2016
How do I stop my script task doing this? (which should be the solution)
If that's not easy, option 2 being..
My connection managers are flat file source and OLEDB destination. The OLEDB uses the default code page which is 1252. If these characters are not a match in code page 1252, what should I be using? Are there any other workarounds without changing the code page?
Script task:
foreach (string file in files)... some other checks
{
var tFile = Path.GetTempFileName();
using (StreamReader rFile = new StreamReader(file))
using (var swriter = new StreamWriter(tFile))
{
string line;
while ((line = rFile.ReadLine()) != null)
{
NrDelimtrInLine = line.Count(x => x == '|') + 1;
if (columnCount == NrDelimtrInLine)
{
swriter.WriteLine(line);
}
}}}
Thank you so much.
It's not clear to me what you intend since "I do not want these characters to be altered" seems mutually exclusive with "they must be replaced to avoid truncation". I would need to see the code to give you further advice.
In general I recommend always testing your regex patterns outside of code first. I usually use http://regexr.com
If you want to match your special characters:
If you want to match anything except your special characters:

String processing / CSV challenge

Having used SQL Server Bulk insert of CSV file with inconsistent quotes (CsvToOtherDelimiter option) as my basis, I discovered a few weirdnesses with the RemoveCSVQuotes part [it chopped the last char from quoted strings that contained a comma!]. So.. rewrote that bit (maybe a mistake?)
One wrinkle is that the client has asked 'what about data like this?'
""17.5179C,""
I assume if I wanted to keep using the CsvToOtherDelimiter solution, I'd have to amend the RegExp...but it's WAY beyond me... what's the best approach?
To clarify: we are using C# to pre-process the file into a pipe-delimited format prior to running a bulk insert using a format file. Speed is pretty vital.
The accepted answer from your link starts with:
You are going to need to preprocess the file, period.
Why not transform your csv to xml? Then you would be able to verify your data against an xsd before storing into a database.
To convert a CSV string into a list of elements, you could write a program that keeps track of state (in quotes or out of quotes) as it processes the string one character at a time, and emits the elements it finds. The rules for quoting in CSV are weird, so you'll want to make sure you have plenty of test data.
The state machine could go like this:
scan until quote (go to 2) or comma (go to 3)
if the next character is a quote, add only one of the two quotes to the field and return to 1. Otherwise, go to 4 (or report an error if the quote isn't the first character in the field).
emit the field, go to 1
scan until quote (go to 5)
if the next character is a quote, add only one of the two quotes to the field and return to 4. Otherwise, emit the field, scan for a comma, and go to 1.
This should correctly scan stuff like:
hello, world, 123, 456
"hello world", 123, 456
"He said ""Hello, world!""", "and I said hi"
""17.5179C,"" (correctly reports an error, since there should be a
separator between the first quoted string "" and the second field
17.5179C).
Another way would be to find some existing library that does it well. Surely, CSV is common enough that such a thing must exist?
edit:
You mention that speed is vital, so I wanted to point out that (so long as the quoted strings aren't allowed to include line returns...) each line may be processed independently in parallel.
I ended up using the csv parser that I don't know we had already (comes as part of our code generation tool) - and noting that ""17.5179C,"" is not valid and will cause errors.

Categories