C# Use Regex to split on Words

C# Use Regex to split on Words - c#

This is a stripped down version of code I am working on. The purpose of the code is to take a string of information, break it down, and parse it into key value pairs.
Using the info in the example below, a string might look like:
"DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567"
One further point about the above example, at least three of the features we have to parse out will occasionally include additional values. Here is an updated fake example string.
"DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568"
The problem with this is that the code refuses to split out DIVIDE and DIV information separately. Instead, it keeps splitting at DIV and then assigning the rest of the information as the value.
Is there a way to tell my code that DIVIDE and DIV need to be parsed out as two separate values, and to not turn DIVIDE into DIV?
public List<string> FeatureFilterStrings
{
// All possible feature types from the EWSD switch.
get
{
return new List<string>() { "DIVIDE", "DIV", "CLACOS", "INT"};
}
}
public void Parse(string input){
Func<string, bool> queryFilter = delegate(string line) { return FeatureFilterStrings.Any(s => line.Contains(s)); };
Regex regex = new Regex(#"(?=\\bDIVIDE|DIV|CLACOS|INT)");
string[] ms = regex.Split(updatedInput);
List<string> queryLines = new List<string>();
// takes the parsed out data and assigns it to the queryLines List<string>
foreach (string m in ms)
{
queryLines.Add(m);
}
var features = queryLines.Where(queryFilter);
foreach (string feature in features)
{
foreach (Match m in Regex.Matches(workLine, valueExpression))
{
string key = m.Groups["key"].Value.Trim();
string value = String.Empty;
value = Regex.Replace(m.Groups["value"].Value.Trim(), #"s", String.Empty);
AddKeyValue(key, value);
}
}
private void AddKeyValue(string key, string value)
{
try
{
// Check if key already exists. If it does, remove the key and add the new key with updated value.
// Value information appends to what is already there so no data is lost.
if (this.ContainsKey(key))
{
this.Remove(key);
this.Add(key, value.Split('&'));
}
else
{
this.Add(key, value.Split('&'));
}
}
catch (ArgumentException)
{
// Already added to the dictionary.
}
}
}
Further information, the string information does not have a set number of spaces between each key/value, each string may not include all of the values, and the features aren't always in the same order. Welcome to parsing old telephone switch information.

I would create a dictionary from your input string
string input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var dict = Regex.Matches(input, #"(\w+?) = (.+?)( |$)").Cast<Match>()
.ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);
Test the code:
foreach(var kv in dict)
{
Console.WriteLine(kv.Key + "=" + kv.Value);
}

This might be a simple alternative for you.
Try this code:
var input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var parts = input.Split(new [] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(xs => xs[0], xs => xs[1]);
I then get the following dictionary:
Based on your updated input, things get more complicated, but this works:
var input = "DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568";
Func<string, char, string> tighten =
(i, c) => String.Join(c.ToString(), i.Split(c).Select(x => x.Trim()));
var parts =
tighten(tighten(input, '&'), ',')
.Split(new[] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts
.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(
xs => xs[0],
xs => xs
.Skip(1)
.SelectMany(x => x.Split(','))
.SelectMany(x => x.Split('&'))
.ToArray());
I get this dictionary:

Related

Remove duplicate combination of numbers in C# from csv

I'm trying to remove the duplicate combination from a csv file.
I tried using Distinct but it seems to stay the same.
string path;
string newcsvpath = #"C:\Documents and Settings\MrGrimm\Desktop\clean.csv";
OpenFileDialog openfileDial = new OpenFileDialog();
if (openfileDial.ShowDialog() == DialogResult.OK)
{
path = openfileDial.FileName;
var lines = File.ReadLines(path);
var grouped = lines.GroupBy(line => string.Join(", ", line.Split(',').Distinct())).ToArray();
var unique = grouped.Select(g => g.First());
var buffer = new StringBuilder();
foreach (var name in unique)
{
string value = name;
buffer.AppendLine(value);
}
File.WriteAllText(newcsvpath ,buffer.ToString());
label5.Text = "Complete";
}
For example, I have a combination of
{ 1,1,1,1,1,1,1,1 } { 1,1,1,1,1,1,1,2 }
{ 2,1,1,1,1,1,1,1 } { 1,1,1,2,1,1,1,1 }
The output should be
{ 1,1,1,1,1,1,1,1 }
{ 2,1,1,1,1,1,1,1 }

From you example, it seems that you want to treat each line as a sequence of numbers and that you consider two lines equal if one sequence is a permutation of the other.
So from reading your file, you have:
var lines = new[]
{
"1,1,1,1,1,1,1,1",
"1,1,1,1,1,1,1,2",
"2,1,1,1,1,1,1,1",
"1,1,1,2,1,1,1,1"
};
Now let's convert it to an array of number sequences:
var linesAsNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.ToArray())
.ToArray();
Or better, since we are not interested in permutations, we can sort the numbers in the sequences immediately:
var linesAsSortedNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.OrderBy(number => number)
.ToArray())
.ToArray();
When using Distinct on this, we have to pass a comparer which considers two array equal, if they have the same elements. Let's use the one from this SO question
var result = linesAsSortedNumberSequences.Distinct(new IEnumerableComparer<int>());

Try it
HashSet<string> record = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
StringBuilder textEditor= new StringBuilder();
foreach (string col in columns)
{
textEditor.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
if (!record.Add(textEditor.ToString())
{
}
}

C#: Loop over Textfile, split it and Print a new Textfile

I get many lines of String as an Input that look like this. The Input is a String that comes from
theObjects.Runstate;
each #VAR;****;#ENDVAR; represents one Line and one step in the loop.
#VAR;Variable=Speed;Value=Fast;Op==;#ENDVAR;#VAR;Variable=Fabricator;Value=Freescale;Op==;#ENDVAR;
I split it, to remove the unwanted fields, like #VAR,#ENDVAR and Op==.
The optimal Output would be:
Speed = Fast;
Fabricator = Freescale; and so on.
I am able to cut out the #VAR and the#ENDVAR. Cutting out the "Op==" wont be that hard, so thats now not the main focus of the question. My biggest concern right now is,thatI want to print the Output as a Text-File. To print an Array I would have to loop over it. But in every iteration, when I get a new line, I overwrite the Array with the current splitted string. I think the last line of the Inputfile is an empty String, so the Output I get is just an empty Text-File. It would be nice if someone could help me.
string[] w;
Textwriter tw2;
foreach (EA.Element theObjects in myPackageObject.Elements)
{
theObjects.Type = "Object";
foreach (EA.Element theElements in PackageHW.Elements)
{
if (theObjects.ClassfierID == theElements.ElementID)
{
t = theObjects.RunState;
w = t.Replace("#ENDVAR;", "#VAR;").Replace("#VAR;", ";").Split(new string[] { ";" }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in w)
{
tw2.WriteLine(s);
}
}
}
}

This linq-query gives the exptected result:
var keyValuePairLines = File.ReadLines(pathInputFile)
.Select(l =>
{
l = l.Replace("#VAR;", "").Replace("#ENDVAR;", "").Replace("Op==;", "");
IEnumerable<string[]> tokens = l.Split(new[]{';'}, StringSplitOptions.RemoveEmptyEntries)
.Select(t => t.Split('='));
return tokens.Select(t => {
return new KeyValuePair<string, string>(t.First(), t.Last());
});
});
foreach(var keyValLine in keyValuePairLines)
foreach(var keyVal in keyValLine)
Console.WriteLine("Key:{0} Value:{1}", keyVal.Key, keyVal.Value);
Output:
Key:Variable Value:Speed
Key:Value Value:Fast
Key:Variable Value:Fabricator
Key:Value Value:Freescale
If you want to output it to another text-file with one key-value pair on each line:
File.WriteAllLines(pathOutputFile, keyValuePairLines.SelectMany(l =>
l.Select(kv => string.Format("{0}:{1}", kv.Key, kv.Value))));
Edit according to your question in the comment:
"What would I have to change/add so that the Output is like this. I
need AttributeValuePairs, for example: Speed = Fast; or Fabricator =
Freescale ?"
Now i understand the logic, you have key-value pairs but you are interested only in the values. So every two key-values belong together, the first value of a pair specifies the attibute and the second value the value of that attribute(f.e. Speed=Fast).
Then it's a little bit more complicated:
var keyValuePairLines = File.ReadLines(pathInputFile)
.Select(l =>
{
l = l.Replace("#VAR;", "").Replace("#ENDVAR;", "").Replace("Op==;", "");
string[] tokens = l.Split(new[]{';'}, StringSplitOptions.RemoveEmptyEntries);
var lineValues = new List<KeyValuePair<string, string>>();
for(int i = 0; i < tokens.Length; i += 2)
{
// Value to a variable can be found on the next index, therefore i += 2
string[] pair = tokens[i].Split('=');
string key = pair.Last();
string value = null;
string nextToken = tokens.ElementAtOrDefault(i + 1);
if (nextToken != null)
{
pair = nextToken.Split('=');
value = pair.Last();
}
var keyVal = new KeyValuePair<string, string>(key, value);
lineValues.Add(keyVal);
}
return lineValues;
});
File.WriteAllLines(pathOutputFile, keyValuePairLines.SelectMany(l =>
l.Select(kv=>string.Format("{0} = {1}", kv.Key, kv.Value))));
Output in the file with your single sample-line:
Speed = Fast
Fabricator = Freescale

Occurence of elements in the file with c# and Dictionary

I have a file as
outlook temperature Humidity Windy PlayTennis
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
I want to find occurence of each element e.g
sunny: 2
rain: 3
overcast:1
hot: 3
and so on
My code is:
string file = openFileDialog1.FileName;
var text1 = File.ReadAllLines(file);
StringBuilder str = new StringBuilder();
string[] lines = File.ReadAllLines(file);
string[] nonempty=lines.Where(s => s.Trim(' ')!="")
.Select(s => Regex.Replace(s, #"\s+", " ")).ToArray();
string[] colheader = null;
if (nonempty.Length > 0)
colheader = nonempty[0].Split();
else
return;
var linevalue = nonempty.Skip(1).Select(l => l.Split());
int colcount = colheader.Length;
Dictionary<string, string> colvalue = new Dictionary<string, string>();
for (int i = 0; i < colcount; i++)
{
int k = 0;
foreach (string[] values in linevalue)
{
if(! colvalue.ContainsKey(values[i]))
{
colvalue.Add(values[i],colheader[i]);
}
label2.Text = label2.Text + k.ToString();
}
}
foreach (KeyValuePair<string, string> pair in colvalue)
{
label1.Text += pair.Key+ "\n";
}
Output I get here is
sunny
overcast
rain
hot
mild
cool
N
P
true
false
I also want to find the occurence, which I am unable to get. Can u please help me out here.

This LINQ query will return Dictionary<string, int> which will contain each word in file as key, and word's occurrences as value:
var occurences = File.ReadAllLines(file).Skip(1) // skip titles line
.SelectMany(l => l.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries))
.GroupBy(w => w)
.ToDictionary(g => g.Key, g => g.Count());
Usage of dictionary:
int sunnyOccurences = occurences["sunny"];
foreach(var pair in occurences)
label1.Text += String.Format("{0}: {1}\n", pair.Key, pair.Value);

Seems to me like you are implementing a simple Tag Cloud. I have used non-generic collection but you can replace it with generic. Replace the HashTable with Dictionary
Follow this code:
Hashtable tagCloud = new Hashtable();
ArrayList frequency = new ArrayList();
Read from a file and store it as array
string[] lines = File.ReadAllLines("file.txt");
//use the specific delimiter
char[] delimiter = new char[] { ' ' };
StringBuilder buffer = new StringBuilder();
foreach (string line in lines)
{
if (line.ToString().Length != 0)
{
buffer.Append((" " + line.Trim()));
}
}
string[] words = buffer.ToString().Trim().Split(delimiter);
Storing occurrence of each word.
List<string> listOfWords = new List<string>(words);
foreach (string i in listOfWords)
{
int c = 0;
foreach (string j in words)
{
if (i.Equals(j))
c++;
}
frequency.Add(c);
}
Store as key value pair. Value will be word and key will be its occurrence
for (int i = 0; i < listOfWords.Count; i++)
{
//use dictionary here
tagCloud.Add(listOfWords[i], (int)frequency[i]);
}

If all you want is the keyword and a count of how many times they appear in the file, then lazyberezovsky's solution is about as elegant of a solution as you will find. But if you need to do any other metrics on the file's data, then I would load the file into a collection that keeps your other metadata intact.
Something simple like:
var forecasts = File.ReadAllLines(file).Skip(1) // skip the header row
.Select(line => line.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries)) // split the line into an array of strings
.Select (f =>
new
{
Outlook = f[0],
Temperature = f[1],
Humidity = f[2],
Windy = f[3],
PlayTennis = f[4]
});
will give you an IEnumerable<> of an anonymous type that has properties that can be queried.
For example if you wanted to see how many times "sunny" occurred in the Outlook then you could just use LINQ to do this:
var count = forecasts.Count( f => f.Outlook == "sunny");
Or if you just wanted the list of all outlooks you could write:
var outlooks = forecasts.Select(f => f.Outlook).Distinct();
Where this is useful is when you want to do more complicated queries like "How many rainy cool days are there?
var count = forecasts.Count (f => f.Outlook == "rain" && f.Temperature == "cool");
Again if you just want all words and their occurrence count, then this is overkill.

How to make the custom parser for text file

Actually I set four columns using data table and I want this column retrieve value from text file. I used regex for remove the particular line from the text file.
My objective is that I want to show text file on the grid using data table so first I am trying to create data table and remove the line (show at the program) using regex.
Here I post my full code.
namespace class
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
StreamReader sreader = File.OpenText(#"C:\FareSearchRegex.txt");
string line;
DataTable dt = new DataTable();
DataRow dr;
dt.Columns.Add("PTC");
dt.Columns.Add("CUR");
dt.Columns.Add("TAX");
dt.Columns.Add("FARE BASIS");
while ((line = sreader.ReadLine()) != null)
{
var pattern = "---------- RECOMMENDATION 1 OF 3 IN GROUP 1 (USD 168.90)----------";
var result = Regex.Replace(line,pattern," ");
dt.Rows.Add(line);
}
}
}
class Class1
{
string PTC;
string CUR;
float TAX;
public string gsPTC
{
get{ return PTC; }
set{ PTC = value; }
}
public string gsCUR
{
get{ return CUR; }
set{ CUR = value; }
}
public float gsTAX
{
get{ return TAX; }
set{ TAX = value; }
}
}
}

If your format is strict(e.g. always 4 columns) and you want to remove only this complete line i don't see any reason to use regex:
var rows = File.ReadLines(#"C:\FareSearchRegex.txt")
.Where(l => l != "---------- RECOMMENDATION 1 OF 3 IN GROUP 1 (USD 168.90)----------")
.Select(l => new { line = l, items = l.Split(','), row = dt.Rows.Add() });
foreach (var x in rows)
x.row.ItemArray = x.items;
(assumed that the fields are separated by comma)
Edit: This works with your pastebin:
string header = " PTC CUR TAX FARE BASIS";
bool takeNextLine = false;
foreach (string line in File.ReadLines(#"C:\FareSearchRegex.txt"))
{
if (line.StartsWith(header))
takeNextLine = true;
else if (takeNextLine)
{
var tokens = line.Split(new[] { #" " }, StringSplitOptions.RemoveEmptyEntries);
dt.Rows.Add().ItemArray = tokens.Where((t, i) => i != 2).ToArray();
takeNextLine = false;
}
}
(since you have an empty column which you want to exclude from the result i've used the clumsy and possibly error-prone(?) query Where((t, i) => i != 2))

To parse the file you'll need to:
Split the text of the file into data chunks. A chunk, in your case can be identified by the header PTC CUR TAX FARE BASIS and by the TOTAL line. To split the text you'll need to tokenize the input as follows> (i) define a regular expression to match the headers, (ii) define a regular expression to match the Total lines (footers); Using (i) and (ii) you can join them by the order of appearance index and determine the total size of each chunk (see the line with (x,y)=>new{StartIndex = x.Match.Index, EndIndex = y.Match.Index + y.Match.Length}) below). Use String.Substring method to separate the chunks.
Extract the data from each individual chunk. Knowing that data is split by lines you just have to iterate through all lines in a chunk (ignoring header and footer) and process each line.
This code should help:
string file = #"C:\FareSearchRegex.txt";
string text = File.ReadAllText(file);
var headerRegex = new Regex(#"^(\)>)?\s+PTC\s+CUR\s+TAX\s+FARE BASIS$", RegexOptions.IgnoreCase | RegexOptions.Multiline);
var totalRegex = new Regex(#"^\s+TOTAL[\w\s.]+?$",RegexOptions.IgnoreCase | RegexOptions.Multiline);
var lineRegex = new Regex(#"^(?<Num>\d+)?\s+(?<PTC>[A-Z]+)\s+\d+\s(?<Cur>[A-Z]{3})\s+[\d.]+\s+(?<Tax>[\d.]+)",RegexOptions.IgnoreCase | RegexOptions.Multiline);
var dataIndices =
headerRegex.Matches(text).Cast<Match>()
.Select((m, index) => new{ Index = index, Match = m })
.Join(totalRegex.Matches(text).Cast<Match>().Select((m, index) => new{ Index = index, Match = m }),
x => x.Index,
x => x.Index,
(x, y) => new{ StartIndex = x.Match.Index, EndIndex = y.Match.Index + y.Match.Length });
var items = dataIndices
.Aggregate(new List<string>(), (list, x) =>
{
var item = text.Substring(x.StartIndex, x.EndIndex - x.StartIndex);
list.Add(item);
return list;
});
var result = items.SelectMany(x =>
{
var lines = x.Split(new string[]{Environment.NewLine, "\r", "\n"}, StringSplitOptions.RemoveEmptyEntries);
return lines.Skip(1) //Skip header
.Take(lines.Length - 2) // Ignore footer
.Select(line =>
{
var match = lineRegex.Match(line);
return new
{
Ptc = match.Groups["PTC"].Value,
Cur = match.Groups["Cur"].Value,
Tax = Convert.ToDouble(match.Groups["Tax"].Value)
};
});
});

Convert array of strings to Dictionary<string, int> c# then output to Visual Studio

I have an array of strings like so:
[0]Board1
[1]Messages Transmitted75877814
[2]ISR Count682900312
[3]Bus Errors0
[4]Data Errors0
[5]Receive Timeouts0
[6]TX Q Overflows0
[7]No Handler Failures0
[8]Driver Failures0
[9]Spurious ISRs0
just to clarify the numbers in the square brackets indicate the strings position in the array
I want to convert the array of strings to a dictionary with the string to the left of each number acting as the key, for example (ISR Count, 682900312)
I then want to output specific entries in the dictionary to a text box/table in visual studio (which ever is better) it would be preferable for the numbers to be left aligned.
excuse my naivety, I'm a newbie!

Pretty Simple. Tried and Tested
string[] arr = new string[] { "Board1", "ISR Count682900312", ... };
var numAlpha = new Regex("(?<Alpha>[a-zA-Z ]*)(?<Numeric>[0-9]*)");
var res = arr.ToDictionary(x => numAlpha.Match(x).Groups["Alpha"],
x => numAlpha.Match(x).Groups["Numeric"]);

string[] strings =
{
"Board1", "Messages232"
};
Dictionary<string, int> dictionary = new Dictionary<string, int>();
foreach (var s in strings)
{
int index = 0;
for (int i = 0; i < s.Length; i++)
{
if (Char.IsDigit(s[i]))
{
index = i;
break;
}
}
dictionary.Add(s.Substring(0, index), int.Parse(s.Substring(index)));
}

var stringArray = new[]
{
"[0]Board1",
"[1]Messages Transmitted75877814",
"[2]ISR Count682900312",
"[3]Bus Errors0",
"[4]Data Errors0",
"[5]Receive Timeouts0",
"[6]TX Q Overflows0",
"[7]No Handler Failures0",
"[8]Driver Failures0",
"[9]Spurious ISRs0"
};
var resultDict = stringArray.Select(s => s.Substring(3))
.ToDictionary(s =>
{
int i = s.IndexOfAny("0123456789".ToCharArray());
return s.Substring(0, i);
},
s =>
{
int i = s.IndexOfAny("0123456789".ToCharArray());
return int.Parse(s.Substring(i));
});
EDIT: If the numbers in brackets are not included in the strings, remove .Select(s => s.Substring(3)).

Here you go:
string[] strA = new string[10]
{
"Board1",
"Messages Transmitted75877814",
"ISR Count682900312",
"Bus Errors0",
"Data Errors0",
"Receive Timeouts0",
"TX Q Overflows0",
"No Handler Failures0",
"Driver Failures0",
"Spurious ISRs0"
};
Dictionary<string, int> list = new Dictionary<string, int>();
foreach (var item in strA)
{
// this Regex matches any digit one or more times so it picks
// up all of the digits on the end of the string
var match = Regex.Match(item, #"\d+");
// this code will substring out the first part and parse the second as an int
list.Add(item.Substring(0, match.Index), int.Parse(match.Value));
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Use Regex to split on Words - c#

Related

Remove duplicate combination of numbers in C# from csv

C#: Loop over Textfile, split it and Print a new Textfile

Occurence of elements in the file with c# and Dictionary

How to make the custom parser for text file

Convert array of strings to Dictionary<string, int> c# then output to Visual Studio

Categories

Resources