I'm sure this isn't as complicated as I'm making it.
I have a string that follows the following pattern:
#,"value"#,"next value"#,"next value" . . .
I need to parse out the number/value pairs in order to use the data in my application.
Here is a sample of a return:
0,"120"1,""10,"298630427"29,"577015971830"30,"SG MSNA "33,"A1"34,"4625"35,"239"36,"2105"37,"2759"60,"15"112,"0"
To complicate matters the string can contain newline characters (\r,\n, or \r\n).
I can handle that by simply removing the newlines with a few string.replace calls.
I would ultimately like to parse the data into key/value pairs. I just can't seem to get my mind unto the right path.
I apologize if this is trivial but I've been pulling 18+ hours days for two months trying to meet a deadline and my brain is shot. Any assistance or guidance in the right direction will be most appreciated.
var numVal=Regex.Matches(input,#"\"([^\"]+)\"(\d+)")
.Cast<Match>()
.Select(x=>new
{
num=x.Groups[2].Value,
value=x.Groups[1].Value
});
Now you can iterate over numVal
foreach(var nv in numVal)
{
nv.num;
nv.value;
}
If you're going straight to key value pairs, you might be able to use LINQ to make your life easier. You'll have to be aware of and handle cases where you don't match the key/value format. However, you might be able to achieve it using something like this.
var string_delimiter = new [] { ',' };
var kvp_delimiter = new[] { "\"" };
var dictionary = string_value.Split(string_delimiter)
.Select(kvp_string => kvp_string.Split(kvp_delimiter, StringSplitOptions.RemoveEmptyEntries))
.ToDictionary(kvp_vals => kvp_vals.First(), kvp_vals => kvp_vals.Last());
Related
I have a file that is formatted this way --
{2000}000000012199{3100}123456789*{3320}110009558*{3400}9876
54321*{3600}CTR{4200}D2343984*JOHN DOE*1232 STREET*DALLAS TX
78302**{5000}D9210293*JANE DOE*1234 STREET*SUITE 201*DALLAS
TX 73920**
Basically, the number in curly brackets denotes field, followed by the value for that field. For example, {2000} is the field for "Amount", and the value for it is 121.99 (implied decimal). {3100} is the field for "AccountNumber" and the value for it is 123456789*.
I am trying to figure out a way to split the file into "records" and each record would contain the record type (the value in the curly brackets) and record value, but I don't see how.
How do I do this without a loop going through each character in the input?
A different way to look at it.... The { character is a record delimiter, and the } character is a field delimiter. You can just use Split().
var input = #"{2000}000000012199{3100}123456789*{3320}110009558*{3400}987654321*{3600}CTR{4200}D2343984*JOHN DOE*1232 STREET*DALLAS TX78302**{5000}D9210293*JANE DOE*1234 STREET*SUITE 201*DALLASTX 73920**";
var rows = input.Split( new [] {"{"} , StringSplitOptions.RemoveEmptyEntries);
foreach (var row in rows)
{
var fields = row.Split(new [] { "}"}, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine("{0} = {1}", fields[0], fields[1]);
}
Output:
2000 = 000000012199
3100 = 123456789*
3320 = 110009558*
3400 = 987654321*
3600 = CTR
4200 = D2343984*JOHN DOE*1232 STREET*DALLAS TX78302**
5000 = D9210293*JANE DOE*1234 STREET*SUITE 201*DALLASTX 73920**
Fiddle
This regular expression should get you going:
Match a literal {
Match 1 or more digts ("a number")
Match a literal }
Match all characters that are not an opening {
\{\d+\}[^{]+
It assumes that the values itself cannot contain an opening curly brace. If that's the case, you need to be more clever, e.g. #"\{\d+\}(?:\\{|[^{])+" (there are likely better ways)
Create a Regex instance and have it match against the text. Each "field" will be a separate match
var text = #"{123}abc{456}xyz";
var regex = new Regex(#"\{\d+\}[^{]+", RegexOptions.Compiled);
foreach (var match in regex.Matches(text)) {
Console.WriteLine(match.Groups[0].Value);
}
This doesn't fully answer the question, but it was getting too long to be a comment, so I'm leaving it here in Community Wiki mode. It does, at least, present a better strategy that may lead to a solution:
The main thing to understand here is it's rare — like, REALLY rare — to genuinely encounter a whole new kind of a file format for which an existing parser doesn't already exist. Even custom applications with custom file types will still typically build the basic structure of their file around a generic format like JSON or XML, or sometimes an industry-specific format like HL7 or MARC.
The strategy you should follow, then, is to first determine exactly what you're dealing with. Look at the software that generates the file; is there an existing SDK, reference, or package for the format? Or look at the industry surrounding this data; is there a special set of formats related to that industry?
Once you know this, you will almost always find an existing parser ready and waiting, and it's usually as easy as adding a NuGet package. These parsers are genuinely faster, need less code, and will be less susceptible to bugs (because most will have already been found by someone else). It's just an all-around better way to address the issue.
Now what I see in the question isn't something I recognize, so it's just possible you genuinely do have a custom format for which you'll need to write a parser from scratch... but even so, it doesn't seem like we're to that point yet.
Here is how to do it in linq without slow regex
string x = "{2000}000000012199{3100}123456789*{3320}110009558*{3400}987654321*{3600}CTR{4200}D2343984*JOHN DOE*1232 STREET*DALLAS TX78302**{5000}D9210293*JANE DOE*1234 STREET*SUITE 201*DALLASTX 73920**";
var result =
x.Split('{',StringSplitOptions.RemoveEmptyEntries)
.Aggregate(new List<Tuple<string, string>>(),
(l, z) => { var az = z.Split('}');
l.Add(new Tuple<string, string>(az[0], az[1]));
return l;})
LinqPad output:
I've tried googling this and looking at questions/answers on here but I'm not having much luck.
I have a list of values that I've already put into an array by splitting on the commas (","), but now I need to split on the colons (":"). I am at a loss for how to do this, everything I've tried so far hasn't working and I can't figure out how to fix it.
string AdditionalData = "Name: John, Age: 43, Location: California";
string[] firstData = AdditionalData.Split(',');
The above code is how far I've gotten - this works - but no matter what I try I can't figure out how to split the data on the colon. Basically, I'm looking to take the array "firstData" and make that into a new array.
Any help would be appreciated and apologies for the simplicity of the question, I'm new to this!
Side note: This is part of an asp.net mvc project if that is of any help, the tag was removed. the results are also displayed as a web page, not in the console.
Iterate each array item using a foreach loop.
foreach(string dataString in firstData)
{
string[] temp = dataString.Split(':')
//do something with the new array here
}
I guess this is what you want, but I'm not sure.
var secondData = firstData.Select(str => str.Split(':'));
One of the things I really value from this community is learning ways to do things in two lines that would normally take me twenty. In that spirit, I've done my best to take some string parsing down from about a dozen lines to three. But I feel like there's someone out there who wants to show me how this is actually a mess. Just for my own edification, is there a cleaner way to do the following? Could it all be done in one line?
string getThis = "<add key=\"messageFilter\" value=\"";
string subStr = strFile.Substring(strFile.IndexOf(getThis) + getThis.Length);
string[] igPhrases = subStr.Substring(0, subStr.IndexOf(";\"")).Split(';');
UPDATE
Thanks for the quick responses! Really helpful examples AND good advice with a minimum of snark. :) Fewer lines is not the same thing as clean and elegant, and reducing lines may actually make the code worse.
Let me rephrase the question.
I've got an XML doc that has the following line: <add key="messageFilter" value="Out of Office AutoReply;Automatic reply;"/>. This doc tells our automated ticketing system not to create tickets from emails that have those phrases in the subject line. Otherwise, we get an endless loop.
I'm working on a small program that will list phrases already included, and then allow users to add new phrases. If we notice that a new autoreply message is starting to loop through the system, we need to be able to add the language of that message to the filter.
I don't work a lot with XML. I like Sperske's solution, but I don't know how to make it dynamic. In other words, I can't put the value in my code. I need to find the key "messageFilter" and then get all the values associated with that key.
What I've done works, but it seems a little cumbersome. Is there a more straightforward way to get the key values? And to add a new one?
A slightly different one liner (split for readability):
System.Xml.Linq.XDocument
.Parse("<add key='messageFilter' value='AttrValue'/>")
.Root
.Attribute("value")
.Value
Outputs:
AttrValue
To address the updated question you could turn all of your <add> nodes into a dictionary (borrowing from Pako's excellent answer, and using a slightly longer string):
var keys = System.Xml.Linq.XDocument
.Parse("<keys><add key='messageFilter' value='AttrValue'/><add key='userFilter' value='AttrValueUser'/></keys>")
.Descendants("add")
.ToDictionary(r => r.Attribute("key").Value, r => r.Attribute("value").Value);
This lets you access your keys like so:
keys["messageFilter"] == "AttrValue"
keys["userFilter"] == "AttrValueUser"
It has been answered already, but for future readers - if you want to parse bigger XML, with root and many add nodes, you may need to use something slightly different.
string xmlPart = "<add key=\"messageFilter\" value=\"\" />";
string xml = "<root>" + xmlPart + "</root>";
var x = XDocument.Parse(xmlPart, LoadOptions.None);
var attributes1 = x.Descendants("add").Select(n => n.Attributes());
var attributes2 = x.Descendants("add").SelectMany(n => n.Attributes());
This will get you IEnumerable<IEnumerable<XAttribute>> (see attributes1) or IEnumerable<XAttribue> (see attributes2). Second option will simply flatten results - all attributes will be held in one collection, no matter from which node they came from.
Of course nothing stops you to filter XAttributes by name or some other criteria - it all up to you!
one ugly line:
string[] igPhrases = strFile.Substring(strFile.IndexOf(getThis) + ("<add key=\"messageFilter\" value=\"").Length).Substring(0, strFile.Substring(strFile.IndexOf("<add key=\"messageFilter\" value=\"") + ("<add key=\"messageFilter\" value=\"").Length).IndexOf(";\"")).Split(';');
I need to somehow detect if there is a parent OU value, and if there is retrieve it.
For example, here there is no parent:
LDAP://servera/OU=Santa Cruz,DC=contoso,DC=com
But here, there is a parent:
LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com
So I would need to retrieve that "Ventas" string.
Another example:
LDAP://servera/OU=Contabilidad,OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com
I would need to retrieve that "Ventas" string as well.
Any suggestions on how to tackle this?
string ldap = "LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com";
Match match = Regex.Match(ldap, #"LDAP://\w+/OU=(?<toplevelou>\w+?),OU=");
if(match.Success)
{
Console.WriteLine(match.Result("${toplevelou}"));
}
I'd find the first occurrence of OU=... and get it's value. Then I'd check if there was another occurrence after it. If so, return the value I've got. If not, return whatever it is you want if there's no parent (String.Empty, or, null, or whatever).
You could also use a regular express like this:
var regex = new Regex(#"OU=(.*?),");
var matches = regex.Matches(ldapString);
Then check how many matches there are. If >1 return the captured value from the first match.
Update
The regex above needs to be improved to allow the case where there's an escaped comma (\,) in the LDAP string. Maybe something like:
var regex = new Regex(#"OU=((.*?(\\\,)+?)+?),");
That may be broken, and there may be simpler way to do the same thing. I'm not a regex wizard.
Another Update
Per Kimberly's comment below the regex should be #"OU=((?:.*?(?:\\\,)*?)+?),".
Call me crazy, but I 'd do it this way (hey ma, look, an one-liner!):
var str = "LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com";
var result = str.Substring(str.LastIndexOf('/') + 1).Split(',')
.Select(s => s.Split('='))
.Where(a => a[0] == "OU")
.Select(a => a[1])
.Reverse().Skip(1).FirstOrDefault();
result is either null or has the string you want. This will work no matter how many OUs are in there and return the second-to-last one, as long as the format of the string is valid to begin with.
Update: possible improvements:
The above will not work correctly if your DN contains an escaped forward slash or an escaped comma.
To fix both of these you need to use regular expressions. Change:
str.Substring(str.LastIndexOf('/') + 1).Split(',')
to:
Regex.Split(Regex.Split(str, "(?<!\\\\)/").Last(), "(?<!\\\\),")
What this does is separate the DN by getting the last part of str after splitting on forward slashes, and split the in parts DN by splitting on commas. In both cases, negative lookbehind is used to make sure that the slashes/commas are not escaped.
Not as pretty, I know. But it's still an one-liner (yay!) and it still allows you to use LINQ further down to handle multiple OUs any way you choose to.
This is my first stack message. Hope you can help.
I have several strings i need to break up for use later. Here are a couple of examples of what i mean....
fred-064528-NEEDED
frederic-84728957-NEEDED
sam-028-NEEDED
As you can see above the string lengths vary greatly so regex i believe is the only way to achieve what i want. what i need is the rest of the string after the second hyphen ('-').
i am very weak at regex so any help would be great.
Thanks in advance.
Just to offer an alternative without using regex:
foreach(string s in list)
{
int x = s.LastIndexOf('-')
string sub = s.SubString(x + 1)
}
Add validation to taste.
Something like this. It will take anything (except line breaks) after the second '-' including the '-' sign.
var exp = #"^\w*-\w*-(.*)$";
var match = Regex.Match("frederic-84728957-NEE-DED", exp);
if (match.Success)
{
var result = match.Groups[1]; //Result is NEE-DED
Console.WriteLine(result);
}
EDIT: I answered another question which relates to this. Except, it asked for a LINQ solution and my answer was the following which I find pretty clear.
Pimp my LINQ: a learning exercise based upon another post
var result = String.Join("-", inputData.Split('-').Skip(2));
or
var result = inputData.Split('-').Skip(2).FirstOrDefault(); //If the last part is NEE-DED then only NEE is returned.
As mentioned in the other SO thread it is not the fastest way of doing this.
If they are part of larger text:
(\w+-){2}(\w+)
If there are presented as whole lines, and you know you don't have other hyphens, you may also use:
[^-]*$
Another option, if you have each line as a string, is to use split (again, depending on whether or not you're expecting extra hyphens, you may omit the count parameter, or use LastIndexOf):
string[] tokens = line.Split("-".ToCharArray(), 3);
string s = tokens.Last();
This should work:
.*?-.*?-(.*)
This should do the trick:
([^\-]+)\-([^\-]+)\-(.*?)$
the regex pattern will be
(?<first>.*)?-(?<second>.*)?-(?<third>.*)?(\s|$)
then you can get the named group "second" to get the test after 2nd hyphen
alternatively
you can do a string.split('-') and get the 2 item from the array