Splitting a string with regex - c#

I have a list of strings that look like this:
List<string> list = new List<string>;
list.add("AAPL7131221P00590000");
list.add("AAPL7131206C00595000");
list.add("AAPL7131213P00600000");
I would like to remove the date that is between AAPL7 and the next letter which is either C or P, and then add it to a new list. How do I use regex to get: 131221, 131206, or 131212 so I can populate a new list?

You don't need regex for this one, you can just use Substring, assuming that all your inputs will be the same number of characters.
var startingString = "AAPL7"; // holds whatever the starting string is
var input = "AAPL7131221P00590000";
var outputDate = input.Substring(startingString.Length, 6);
So if you wanted to make this a one-liner for making a collection:
List<string> allDates = yourInputValues
.Select(x => x.Substring(startingString.Length, 6))
.ToList();

Consider the following code snippet...
string startPattern = "AAPL7"; // GOOGL, GOOG, etc
List<string> newlist = list
.Select(n => Regex.Match(n, string.Format(#"(?<=^{0})\d+", startPattern)).Value)
.ToList();

//Regex regex = new Regex("(\\w{5})(\\d*)(\\w*)");
Regex regex = new Regex("(\\w*)(\\d{6})([PC])(\\d*)");
List<string> list = new List<string>();
list.Add("AAPL7131221P00590000");
list.Add("AAPL7131206C00595000");
list.Add("AAPL7131213P00600000");
List<string> extracted = new List<string>();
foreach (string item in list)
{
extracted.Add(regex.Split(item)[2]);
}
Is this what you want?

I just add something that may help in case you have the different date length (i.e. 14131 for 31st, January 2014)
string startpart = "AAPL7"; // or whatever
string mainstring = "AAPL714231P00590000"; // or any other input
mainstring = mainstring.Substring(startpart.Length); //this will through away startpart
int index;
if (mainstring.IndexOf("P") >= 0) index = mainstring.IndexOf("P"); //if there is no "P" it gives -1
else index = mainstring.IndexOf("C");
string date = mainstring.Substring(0,index);
i guess it is a safe approach to handle wider cases

Related

Split a string of Number and characters

I had a List liRoom which contains a alphanumeric and Alphabetic string For Example
List<string> liRoom = new List<string>() {"Room1","Room2","Room3",
"Room4","Hall","Room5","Assembly",
"Room6","Room7","Room8","Room9};
This List is of type Alphanumeric and Alphabetic so i want to take the max numeric value from this list of string.
I had tried to do it this way
var ss = new Regex("(?<Alpha>[a-zA-Z]+)(?<Numeric>[0-9]+)");
List<int> liNumeric = new List<int>();
foreach (string st in liRoom)
{
var varMatch = ss.Match(st);
liNumeric.Add(Convert.ToInt16(varMatch.Groups["Numeric"].Value));
}
int MaxValue = liNumeric.Max();// Result Must be 9 from above Example.
And
List<int> liNumeric = new List<int>();
foreach (string st in liRoom)
{
liNumeric.Add( int.Parse(new string(st.Where(char.IsDigit).ToArray())));
}
int MaxValue = liNumeric.Max();// Result Must be 9 from above Example.
But both shows error when st is Hall,Assembly
Help me How to do this.
there are few reasons you will get exception in your code. I'm adding few condition for those possible exceptions.
List<int> liNumeric = new List<int>();
foreach (string st in liRoom)
{
// int.Parse will fail if you don't have any digit in the input
if(st.Any(char.IsDigit))
{
liNumeric.Add(int.Parse(new string(st.Where(char.IsDigit).ToArray())));
}
}
if (liNumeric.Any()) //Max will fail if you don't have items in the liNumeric
{
int MaxValue = liNumeric.Max();
}
Please try the following:
List<string> liRoom = new List<string>() {"Room1","Room2","Room3",
"Room4","Hall","Room5","Assembly",
"Room6","Room7","Room8","Room9"};
var re = new Regex(#"\d+");
int max = liRoom.Select(_ => re.Match(_))
.Where(_ => _.Success)
.Max( _ => int.Parse(_.Value));
/*
max = 9
*/
You don't need foreach, it can be done with one statement:
int value = liRoom.Where(x => x.Any(char.IsDigit))
.Select(x => Convert.ToInt32(new String(x.Where(char.IsDigit).ToArray())))
.Max();
It seems odd but it's working. :)
You should add below in your code by checking whether match is success or not
if (varMatch.Success)
{
liNumeric.Add(Convert.ToInt16(varMatch.Groups["Numeric"].Value));
}

Cannot replace last element in string List

I have an input file that includes data on an entertainer and their performance score. For example,
1. Bill Monohan from North Town 10.54
2. Mary Greenberg from Ohio 3.87
3. Sean Hollen from Markell 7.22
I want to be able to take the last number from a line (their score), perform some math on it, and then replace the old score with the new score.
Here's a brief piece of code for what I'm trying to do:
string line;
StreamReader reader = new StreamReader(#"file.txt");
//Read each line and split by spaces into a List.
while ((line = reader.ReadLine())!= null){
//Find last item in List and convert to a Double in order to perform calculations.
List<string> l = new List<string>();
l = line.Split(null).ToList();
string lastItem = line.Split(null).Last();
Double newItem = Convert.ToDouble(lastItem);
/*Do some math*/
/*Replace lastItem with newItem*/
System.Console.WriteLine(line); }
When I write the new line, nothing changes but I want lastItem to be switched with newItem at the end of the line now. I've tried using:
l[l.Length - 1] = newItem.ToString();
But I'm getting no luck. I just need the best way to replace the last value of a string List like this. I've been going at this for a few hours now and I'm almost at the end of my rope.
Please help me c# masters!
You can use regular expression MatchEvaluator to get number from each line, do calculations, and replace original number with new one:
string line = "1. Bill Monohan from North Town 10.54";
line = Regex.Replace(line, #"(\d+\.?\d*)$", m => {
decimal value = Decimal.Parse(m.Groups[1].Value);
value = value * 2; // calculation
return value.ToString();
});
This regex captures decimal number at the end of input string. Output:
1. Bill Monohan from North Town 21.08
You're not changing anything to your line object before doing your WriteLine.
You will have to rebuild your line, something like this:
var items = string.Split();
items.Last() = "10";//Replace
var line = string.Join(" ", items)
Tip: strings are immutable, look it up.
This should work:
//var l = new List<string>(); // you don't need this
var l = line.Split(null).ToList();
var lastItem = l.Last(); // line.Split(null).Last(); don't split twice
var newItem = Convert.ToDouble(lastItem, CultureInfo.InvariantCulture);
/*Do some math*/
/*Replace lastItem with newItem*/
l[l.Count - 1] = newItem.ToString(); // change the last element
//Console.WriteLine(line); // line is the original string don't work
Console.WriteLine(string.Join(" ", l)); // create new string
This would probably do the job for you. A word on reading files though, if possible, ie they fit in memory, read the entire file at once, it gives you one disk access (well, depends on file size, but yeah) and you do not have to worry about filehandles.
// Read the stuff from the file, gets an string[]
var lines = File.ReadAllLines(#"file.txt");
foreach (var line in lines)
{
var splitLine = line.Split(' ');
var score = double.Parse(splitLine.Last(), CultureInfo.InvariantCulture);
// The math wizard is in town!
score = score + 3;
// Put it back
splitLine[splitLine.Count() - 1] = score.ToString();
// newLine is the new line, what should we do with it?
var newLine = string.Join(" ", splitLine);
// Lets print it cause we are out of ideas!
Console.WriteLine(newLine);
}
What do you want to do with the end result? Do you want it written back to file?
Try this
string subjectString = "Sean Hollen from Markell 7.22";
double Substring =double.Parse(subjectString.Substring(subjectString.IndexOf(Regex.Match(subjectString, #"\d+").Value), subjectString.Length - subjectString.IndexOf(Regex.Match(subjectString, #"\d+").Value)).ToString());
double NewVal = Substring * 10; // Or any of your operation
subjectString = subjectString.Replace(Substring.ToString(), NewVal.ToString());
Note: This will not work if the number appears twice on the same line
You are creating and initializing the list in a loop, hence it contains always only the current line. Do you want to find the highest score of all entertainers or the highest score of each entertainer (in case an entertainer could repeat in the file)?
However, here is an approach that gives you both:
var allWithScore = File.ReadAllLines(path)
.Select(l =>
{
var split = l.Split();
string entertainer = string.Join(" ", split.Skip(1).Take(split.Length - 2));
double score;
bool hasScore = double.TryParse(split.Last(), NumberStyles.Float, CultureInfo.InvariantCulture, out score);
return new { line = l, split, entertainer, hasScore, score };
})
.Where(x => x.hasScore);
// highest score of all:
double highestScore = allWithScore.Max(x => x.score);
// entertainer with highest score
var entertainerWithHighestScore = allWithScore
.OrderByDescending(x => x.score)
.GroupBy(x => x.entertainer)
.First();
foreach (var x in entertainerWithHighestScore)
Console.WriteLine("Entertainer:{0} Score:{1}", x.entertainer, x.score);
// all entertainer's highest scores:
var allEntertainersHighestScore = allWithScore
.GroupBy(x => x.entertainer)
.Select(g => g.OrderByDescending(x => x.score).First());
foreach (var x in allEntertainersHighestScore)
Console.WriteLine("Entertainer:{0} Score:{1}", x.entertainer, x.score);

How to retrieve a substring based on a first list match

In a string I need to recover a 7 char substring based on the first match from any item in a list. If a match is not made it should return an empty string.
I have the following code:
List<string> myList = new List<string>()
{
"TNCO",
"TNCB",
"TNIT"
};
string sample = "TNSD102, WHRK301, TNIT301, YTRE234";
//doesn't give an index
bool anyfound = myList.Any(w => sample.Contains(w));
//code that needs replacing
string code = sample.Substring(sample.IndexOf("TNC"), 7);
if (code == "")
{
code = sample.Substring(sample.IndexOf("TNIT"), 7);
}
The list is never likely to be more than 35-40 items and the strings < 50 chars.
Anyone able to point me in the right direction?
string val1 = (sample.Split(',').FirstOrDefault(w => myList.Any(m => w.Contains(m))) ?? string.Empty).Trim();
This gives you an IEnumerable of all matches:
var matches = from code in sample.Split(',')
from w in myList
where code.Trim().StartsWith(w)
select code;
To get the first value use FirstOrDefault. Then use the coalesce operator ?? to return an empty string if there was no match.
string firstMatch = (matches.FirstOrDefault() ?? "").Trim();
With data sets this small, you can simply split the string and search for the first match:
// split the sample string into separate entries
var entries = sample.Split(new char[] {',', ' '},
StringSplitOptions.RemoveEmptyEntries);
// find the first entry starting with any allowed prefix
var firstMatch = entries.FirstOrDefault (
e => myList.Any (l => e.StartsWith(l)));
// FirstOrDefault returns null if there are no matches
if (firstMatch == null)
Console.WriteLine("No match!");
else
Console.WriteLine(firstMatch);
Example output (DEMO):
TNIT301
List<string> myList = new List<string> { "TNCO", "TNCB", "TNIT" };
string sample = "TNSD102, WHRK301, TNIT301, YTRE234";
string[] sampleItems = sample.Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var results = myList
.Select(prefix => sampleItems
.FirstOrDefault(item => item.StartsWith(prefix)) ?? "");
Running this code here returns an Index of 2 based on what you are trying to find.
int keyIndex = myList.FindIndex(w => samples.Contains(w));
TNIT301 this is the indexed string value
you could also do the following to return the string value in index position of keyIndex variable value.
var subStrValue = samples.Split(',')[keyIndex];

Extracting parts of a string c#

In C# what would be the best way of splitting this sort of string?
%%x%%a,b,c,d
So that I end up with the value between the %% AND another variable containing everything right of the second %%
i.e. var x = "x"; var y = "a,b,c,d"
Where a,b,c.. could be an infinite comma seperated list. I need to extract the list and the value between the two double-percentage signs.
(To combat the infinite part, I thought perhaps seperating the string out to: %%x%% and a,b,c,d. At this point I can just use something like this to get X.
var tag = "%%";
var startTag = tag;
int startIndex = s.IndexOf(startTag) + startTag.Length;
int endIndex = s.IndexOf(tag, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
Would the best approach be to use regex or use lots of indexOf and substring to do the extracting based on te static %% characters?
Given that what you want is "x,a,b,c,d" the Split() function is actually pretty powerful and regex would be overkill for this.
Here's an example:
string test = "%%x%%a,b,c,d";
string[] result = test.Split(new char[] { '%', ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in result) {
Console.WriteLine(s);
}
Basicly we ask it to split by both '%' and ',' and ignore empty results (eg. the result between "%%"). Here's the result:
x
a
b
c
d
To Extract X:
If %% is always at the start then;
string s = "%%x%%a,b,c,d,h";
s = s.Substring(2,s.LastIndexOf("%%")-2);
//Console.WriteLine(s);
Else;
string s = "v,u,m,n,%%x%%a,b,c,d,h";
s = s.Substring(s.IndexOf("%%")+2,s.LastIndexOf("%%")-s.IndexOf("%%")-2);
//Console.WriteLine(s);
If you need to get them all at once then use this;
string s = "m,n,%%x%%a,b,c,d";
var myList = s.ToArray()
.Where(c=> (c != '%' && c!=','))
.Select(c=>c).ToList();
This'll let you do it all in one go:
string pattern = "^%%(.+?)%%(?:(.+?)(?:,|$))*$";
string input = "%%x%%a,b,c,d";
Match match = Regex.Match(input, pattern);
if (match.Success)
{
// "x"
string first = match.Groups[1].Value;
// { "a", "b", "c", "d" }
string[] repeated = match.Groups[2].Captures.Cast<Capture>()
.Select(c => c.Value).ToArray();
}
You can use the char.IsLetter to get all the list of letter
string test = "%%x%%a,b,c,d";
var l = test.Where(c => char.IsLetter(c)).ToArray();
var output = string.Join(", ", l.OrderBy(c => c));
Since you want the value between the %% and everything after in separate variables and you don't need to parse the CSV, I think a RegEx solution would be your best choice.
var inputString = #"%%x%%a,b,c,d";
var regExPattern = #"^%%(?<x>.+)%%(?<csv>.+)$";
var match = Regex.Match(inputString, regExPattern);
foreach (var item in match.Groups)
{
Console.WriteLine(item);
}
The pattern has 2 named groups called x and csv, so rather than just looping, you can easily reference them by name and assign them to values:
var x = match.Groups["x"];
var y = match.Groups["csv"];

Create dynamic string array in C# and add strings (outcome of a split method) into two separate arrays through a loop

I have a list of strings which includes strings in format: xx#yy
xx = feature name
yy = project name
Basically, I want to split these strings at # and store the xx part in one string array and the yy part in another to do further operations.
string[] featureNames = all xx here
string[] projectNames = all yy here
I am able to split the strings using the split method (string.split('#')) in a foreach or for loop in C# but I can't store two parts separately in two different string arrays (not necessarily array but a list would also work as that can be converted to array later on).
The main problem is to determine two parts of a string after split and then appends them to string array separately.
This is one simple approach:
var xx = new List<string>();
var yy = new List<string>();
foreach(var line in listOfStrings)
{
var split = string.split('#');
xx.Add(split[0]);
yy.Add(split[1]);
}
The above instantiates a list of xx and and a list of yy, loops through the list of strings and for each one splits it. It then adds the results of the split to the previously instantiated lists.
How about the following:
List<String> xx = new List<String>();
List<String> yy = new List<String>();
var strings = yourstring.Split('#');
xx.Add(strings.First());
yy.Add(strings.Last());
var featureNames = new List<string>();
var productNames = new List<string>();
foreach (var productFeature in productFeatures)
{
var parts = productFeature.Split('#');
featureNames.Add(parts[0]);
productNames.Add(parts[1]);
}
How about
List<string> lst = ... // your list containging xx#yy
List<string> _featureNames = new List<string>();
List<string> _projectNames = new List<string>();
lst.ForEach(x =>
{
string[] str = x.Split('#');
_featureNames.Add(str[0]);
_projectNames.Add(str[1]);
}
string[] featureNames = _featureNames.ToArray();
string[] projectNames = _projectNames.ToArray();
You can do something like this:
var splits = input.Select(v => v.Split('#'));
var features = splits.Select(s => s[0]).ToList();
var projects = splits.Select(s => s[1]).ToList();
If you don't mind slightly more code but better performance and less pressure on garbage collector then:
var features = new List<string>();
var projects = new List<string>();
foreach (var split in input.Select(v => v.Split('#')))
{
features.Add(split[0]);
projects.Add(split[1]);
}
But overall I'd suggest to create class and parse your input (more C#-style approach):
public class ProjectFeature
{
public readonly string Project;
public readonly string Feature;
public ProjectFeature(string project, string feature)
{
this.Project = project;
this.Feature = feature;
}
public static IEnumerable<ProjectFeature> ParseList(IEnumerable<string> input)
{
return input.Select(v =>
{
var split = v.Split('#');
return new ProjectFeature(split[1], split[0]);
}
}
}
and use it later (just an example of possible usage):
var projectFeatures = ProjectFeature.ParseList(File.ReadAllLines(#"c:\features.txt")).ToList();
var features = projectFeatures.Select(f => f.Feature).ToList();
var projects = projectFeatures.Select(f => f.Project).ToList();
// ??? etc.
var all_XX = yourArrayOfStrings.Select(str => str.split('\#')[0]); // this will be IENumerable
var all_YY = yourArrayOfStrings.Select(str => str.split('\#')[1]); // the same fot YY. But here make sure that element at [1] exists
The main problem is to determine two parts of a string after split and then appends them to string array separately.
Why the different arrays? Wouldn't a dictionary be more fitting?
List<String> input = File.ReadAllLines().ToList<String>(); // or whatever
var output = new Dictionary<String, String>();
foreach (String line in input)
{
var parts = input.Split('#');
output.Add(parts[0], parts[1]);
}
foreach (var feature in output)
{
Console.WriteLine("{0}: {1}", feature.Key, feature.Value);
}
Try this.
var ls = new List<string>();
ls.Add("123#project");
ls.Add("123#project1");
var f = from c in ls
select new
{
XX = c.Split("#")[0],
YY = c.Split("#")[1]
};
string [] xx = f.Select (x => x.XX).ToArray();
string [] yy = f.Select (x => x.YY).ToArray();

Categories