String to Array, Sort by 3rd Word/Column

String to Array, Sort by 3rd Word/Column - c#

I have a string with numbers, words, and linebreaks that I split into an Array.
If I run Array.Sort(lines) it will sort the Array numerically by Column 1, Number.
How can I instead sort the Array alphabetically by Column 3, Color?
Note: They are not real columns, just spaces separating the words.
I cannot modify the string to change the results.
| Number | Name | Color |
|------------|------------|------------|
| 1 | Mercury | Gray |
| 2 | Venus | Yellow |
| 3 | Earth | Blue |
| 4 | Mars | Red |
C#
Example: http://rextester.com/LSP53065
string planets = "1 Mercury Gray\n"
+ "2 Venus Yellow\n"
+ "3 Earth Blue\n"
+ "4 Mars Red\n";
// Split String into Array by LineBreak
string[] lines = planets.Split(new string[] { "\n" }, StringSplitOptions.None);
// Sort
Array.Sort(lines);
// Result
foreach(var line in lines)
{
Console.WriteLine(line.ToString());
}
Desired Sorted Array Result
3 Earth Blue
1 Mercury Gray
4 Mars Red
2 Venus Yellow

Try this code:
string planets = "1 Mercury Gray \n"
+ "2 Venus Yellow \n"
+ "3 Earth Blue \n"
+ "4 Mars Red \n";
var lines = planets.Split("\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.OrderBy(s => s.Split(' ')[2])
.ToArray();
foreach (var line in lines)
{
Console.WriteLine(line);
}
EDIT: Thanks #Kevin!

Aleks has got the straight-up answer - I just wanted to contribute something from another angle.
This code is fine from an academic, just learning the concepts point of view.
But if you're looking to translate this into something for business dev, you should get in the habit of structuring it like:
Develop a Planet class
Have a function that returns a Planet from a source text line
Have a function that displays a Planet how you intend it to be
displayed.
There are a lot of reasons for this, but the big one is that you'll have reusable, flexible code (look at the function you're writing right now - how likely is it that you'll be able to reuse it down the line for something else?) If you're interested, look up some info on SRP (Single Responsibility Principle) to get more info on this concept.
This is a translated version of your code:
static void Main(string[] args)
{
string planetsDBStr = "1 Mercury Gray \n"
+ "2 Venus Yellow \n"
+ "3 Earth Blue \n"
+ "4 Mars Red \n";
List<Planet> planets = GetPlanetsFromDBString(planetsDBStr);
foreach (Planet p in planets.OrderBy(x => x.color))
{
Console.WriteLine(p.ToString());
}
Console.ReadKey();
}
private static List<Planet> GetPlanetsFromDBString(string dbString)
{
List<Planet> retVal = new List<Planet>();
string[] lines = dbString.Split("\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (string line in lines)
retVal.Add(new Planet(line));
return retVal;
}
public class Planet
{
public int orderInSystem;
public string name;
public string color;
public Planet(string databaseTextLine)
{
string[] parts = databaseTextLine.Split(' ');
this.orderInSystem = int.Parse(parts[0]);
this.name = parts[1];
this.color = parts[2];
}
public override string ToString()
{
return orderInSystem + " " + name + " " + color;
}
}
EDIT: Fixed some formatting issues

You can use an Array.Sort overload that takes the custom comparer:
public class MyComparer : IComparer {
int IComparer.Compare( Object x, Object y ) {
//compare last parts here
}
}

Related

String split with specified string without delimeter

Updated - When searched value is in middle
string text = "Trio charged over alleged $100m money laundering syndicate at Merrylands, Guildford West";
string searchtext= "charged over";
string[] fragments = text.Split(new string[] { searchtext }, StringSplitOptions.None);
//Fragments
//if [0] is blank searched text is in the beginning - searchedtext + [1]
//if [1] is blank searched text is in the end - [0] + searched text
// If searched text is in middle then both items has value - [0] + seachedtext + [1]
//This loop will execute only two times because it can have maximum 2 values, issue will
//come when searched value is in middle (loop should run 3 times) as for the searched value i have to apply differnt logic (like change background color of the text)
// and dont change background color for head and tail
//How do i insert searched value in middle of [0] and [1] ??
I am having a string without delimeter which i am trying to split based on searched string. My requirement is split the string into two , one part contains string without the searchtext and other contains searchtext like below-
Original String - "Bitcoin ATMs Highlight Flaws in EU Money Laundering Rules"
String 1 - Bitcoin ATMs Highlight Flaws in EU
String 2 - Money Laundering Rules
I have written below code it works for the above sample value, but it failed for
Failed - Not returning String 1 and String 2, String is empty
string watch = " Money Laundering Rules Bitcoin ATMs Highlight Flaws in EU";
string serachetxt = "Money Laundering Rules";
This works -
List<string> matchedstr = new List<string>();
string watch = "Bitcoin ATMs Highlight Flaws in EU Money Laundering Rules";
string serachetxt = "Money Laundering Rules";
string compa = watch.Substring(0,watch.IndexOf(serachetxt)); //It returns "Bitcoin ATMs Highlight Flaws in EU"
matchedstr.Add(compa);
matchedstr.Add(serachetxt);
foreach(var itemco in matchedstr)
{
}

You could just consider "Money Laundering Rules" to be the delimiter. Then you can write
string[] result = watch.Split(new string[] { searchtext }, StringSplitOptions.None);
Then you can add the delimiter again
string result1 = result[0];
string result2 = searchtext + result[1];

Use string.Split.
string text = "Bitcoin ATMs Highlight Flaws in EU Money Laundering Rules";
string searchtext = "Money Laundering Rules";
string[] fragments = text.Split(new string[] { searchtext }, StringSplitOptions.None);
fragments will equal:
[0] "Bitcoin ATMs Highlight Flaws in EU "
[1] ""
Everywhere there is a gap between consecutive array elements, your search string appears. e.g.:
string originaltext = string.Join(searchtext, fragments);
Extended Description of String.Split Behaviour
Here is a quick table of the behaviour of string.Split when passed a string.
| Input | Split | Result Array |
+--------+-------+--------------------+
| "ABC" | "A" | { "", "BC" } |
| "ABC" | "B" | { "A", "C" } |
| "ABC" | "C" | { "AB", "" } |
| "ABC" | "D" | { "ABC" } |
| "ABC" | "ABC" | { "", "" } |
| "ABBA" | "A" | { "", "BB", "" } |
| "ABBA" | "B" | { "A", "", "A" } |
| "AAA" | "A" | { "", "", "", "" } |
| "AAA" | "AA" | { "", "A" } |
If you look at the table above, Every place there was a comma in the array (between two consecutive elements in the array), is a place that the split string was found.
If the string was not found, then the result array is only one element (the original string).
If the split string is found at the beginning of the input string, then an empty string is set as the first element of the result array to represent the beginning of the string. Similarly, if the split string is found at the end of the string, an empty string is set as the last element of the result array.
Also, an empty string is included between any consecutive occurrences of the search string in the input string.
In cases where there are ambiguous overlapping locations at which the string could be found in the input string: (e.g. splitting AAA on AA could be split as AA|A or A|AA - where AA is found at position 0 or position 1 in the input string) then the earlier location is used. (e.g. AA|A, resulting in { "", "A" } ).
Again, the invariant is that the original string can always be reconstructed by joining all the fragments and placing exactly one occurrence of the search text in between elements. The following will always be true:
string.Join(searchtext, fragments) == text
If you only want the first split...
You can merge all results after the first back together like this:
if (fragments.Length > 1) {
fragments = new string[] { fragments[0], string.Join(searchtext, fragments.Skip(1)) };
}
... or a more efficient way using String.IndexOf
If you just want to find the first location of the search text string then use String.IndexOf to get the position of the first occurrence of the search text in the input string.
Here's a complete function you can use
private static bool TrySplitOnce(string text, string searchtext, out string beforetext, out string aftertext)
{
int pos = text.IndexOf(searchtext);
if (pos < 0) {
// not found
beforetext = null;
aftertext = null;
return false;
} else {
// found at position `pos`
beforetext = text.Substring(0, pos); // may be ""
aftertext = text.Substring(pos + searchtext.Length); // may be ""
return true;
}
}
You can use this to produce an array, if you like.
usage:
string text = "red or white or blue";
string searchtext = "or";
if (TrySplitOnce(text, searchtext, out string before, out string after)) {
Console.WriteLine("{0}*{1}", before, after);
// output:
// red * white or blue
string[] array = new string[] { before, searchtext, after };
// array == { "red ", "or", " white or blue" };
Console.WriteLine(string.Join("|", array));
// output:
// red |or| white or blue
} else {
Console.WriteLine("Not found");
}
output:
red * white or blue
red |or| white or blue

You can write your own extension method for this:
// Splits s at sep with sep included at beginning of each part except first
// return no more than numParts parts
public static IEnumerable<string> SplitsBeforeInc(this string s, string sep, int numParts = Int32.MaxValue)
=> s.Split(new[] { sep }, numParts, StringSplitOptions.None).Select((p,i) => i > 0 ? sep+p : p);
And use it with:
foreach(var itemco in watch.SplitsBeforeInc(watch, serachetxt, 2))
Here is the same method in a non-LINQ version:
// Splits s at sep with sep included at beginning of each part except first
// return no more than numParts parts
public static IEnumerable<string> SplitsBeforeInc(this string s, string sep, int numParts = Int32.MaxValue) {
var startPos = 0;
var searchPos = 0;
while (startPos < s.Length && --numParts > 0) {
var sepPos = s.IndexOf(sep, searchPos);
sepPos = sepPos < 0 ? s.Length : sepPos;
yield return s.Substring(startPos, sepPos - startPos);
startPos = sepPos;
searchPos = sepPos+sep.Length;
}
if (startPos < s.Length)
yield return s.Substring(startPos);
}

You can try this
string text = "Trio charged over alleged $100m money laundering syndicate at Merrylands, Guildford West";
string searchtext = "charged over";
searchtextPattern = "(?=" + searchtext + ")";
string[] fragments= Regex.Split(text, searchtextPattern);
//fargments will have two elements here
// fragments[0] - "Trio"
// fragments[1] - "charged over alleged $100m money laundering syndicate at Merrylands, Guildford West"
now you can again split fragment which have search text i.e fragments[1] in this case.
see code below
var stringWithoutSearchText = fragments[1].Replace(searchtext, string.Empty);
you need to check whether each fragment contains search text or not. You can do that it your foreach loop on fragments. add below check over there
foreach (var item in fragments)
{
if (item.Contains(searchtext))
{
string stringWithoutSearchText = item.Replace(searchtext, string.Empty);
}
}
Reference : https://stackoverflow.com/a/521172/8652887

In C#, what is the best way to parse this WIKI markup?

I need to take data that I am reading in from a WIKI markup page and store it as a table structure. I am trying to figure out how to properly parse the below markup syntax into some table data structure in C#
Here is an example table:
|| Owner || Action || Status || Comments ||
| Bill | Fix the lobby | In Progress | This is easy |
| Joe | Fix the bathroom | In Progress | Plumbing \\
\\
Electric \\
\\
Painting \\
\\
\\ |
| Scott | Fix the roof | Complete | This is expensive |
and here is how it comes in directly:
|| Owner|| Action || Status || Comments || | Bill\\ | fix the lobby |In Progress | This is eary| | Joe\\ |fix the bathroom\\ | In progress| plumbing \\Electric \\Painting \\ \\ | | Scott \\ | fix the roof \\ | Complete | this is expensive|
So as you can see:
The column headers have "||" as the separator
A row columns have a separator or "|"
A row might span multiple lines (as in the second data row example above) so i would have to keep reading until I hit the same number of "|" (cols) that I have in the header row.
I tried reading in line by line and then concatenating lines that had "\" in between then but that seemed a bit hacky.
I also tried to simply read in as a full string and then just parse by "||" first and then keep reading until I hit the same number of "|" and then go to the next row. This seemed to work but it feel like there might be a more elegant way using regular expressions or something similar.
Can anyone suggest the correct way to parse this data?

I have largely replaced the previous answer, due to the fact that the format of the input after your edit is substantially different from the one posted before. This leads to a somewhat different solution.
Because there are no longer any line breaks after a row, the only way to determine for sure where a row ends, is to require that each row has the same number of columns as the table header. That is at least if you don't want to rely on some potentially fragile white space convention present in the one and only provided example string (i.e. that the row separator is the only | not preceded by a space). Your question at least does not provide this as the specification for a row delimiter.
The below "parser" provides at least the error handling validity checks that can be derived from your format specification and example string and also allows for tables that have no rows. The comments explain what it is doing in basic steps.
public class TableParser
{
const StringSplitOptions SplitOpts = StringSplitOptions.None;
const string RowColSep = "|";
static readonly string[] HeaderColSplit = { "||" };
static readonly string[] RowColSplit = { RowColSep };
static readonly string[] MLColSplit = { #"\\" };
public class TableRow
{
public List<string[]> Cells;
}
public class Table
{
public string[] Header;
public TableRow[] Rows;
}
public static Table Parse(string text)
{
// Isolate the header columns and rows remainder.
var headerSplit = text.Split(HeaderColSplit, SplitOpts);
Ensure(headerSplit.Length > 1, "At least 1 header column is required in the input");
// Need to check whether there are any rows.
var hasRows = headerSplit.Last().IndexOf(RowColSep) >= 0;
var header = headerSplit.Skip(1)
.Take(headerSplit.Length - (hasRows ? 2 : 1))
.Select(c => c.Trim())
.ToArray();
if (!hasRows) // If no rows for this table, we are done.
return new Table() { Header = header, Rows = new TableRow[0] };
// Get all row columns from the remainder.
var rowsCols = headerSplit.Last().Split(RowColSplit, SplitOpts);
// Require same amount of columns for a row as the header.
Ensure((rowsCols.Length % (header.Length + 1)) == 1,
"The number of row colums does not match the number of header columns");
var rows = new TableRow[(rowsCols.Length - 1) / (header.Length + 1)];
// Fill rows by sequentially taking # header column cells
for (int ri = 0, start = 1; ri < rows.Length; ri++, start += header.Length + 1)
{
rows[ri] = new TableRow() {
Cells = rowsCols.Skip(start).Take(header.Length)
.Select(c => c.Split(MLColSplit, SplitOpts).Select(p => p.Trim()).ToArray())
.ToList()
};
};
return new Table { Header = header, Rows = rows };
}
private static void Ensure(bool check, string errorMsg)
{
if (!check)
throw new InvalidDataException(errorMsg);
}
}
When used like this:
public static void Main(params string[] args)
{
var wikiLine = #"|| Owner|| Action || Status || Comments || | Bill\\ | fix the lobby |In Progress | This is eary| | Joe\\ |fix the bathroom\\ | In progress| plumbing \\Electric \\Painting \\ \\ | | Scott \\ | fix the roof \\ | Complete | this is expensive|";
var table = TableParser.Parse(wikiLine);
Console.WriteLine(string.Join(", ", table.Header));
foreach (var r in table.Rows)
Console.WriteLine(string.Join(", ", r.Cells.Select(c => string.Join(Environment.NewLine + "\t# ", c))));
}
It will produce the below output:
Where "\t# " represents a newline caused by the presence of \\ in the input.

Here's a solution which populates a DataTable. It does require a litte bit of data massaging (Trim), but the main parsing is Splits and Linq.
var str = #"|| Owner|| Action || Status || Comments || | Bill\\ | fix the lobby |In Progress | This is eary| | Joe\\ |fix the bathroom\\ | In progress| plumbing \\Electric \\Painting \\ \\ | | Scott \\ | fix the roof \\ | Complete | this is expensive|";
var headerStop = str.LastIndexOf("||");
var headers = str.Substring(0, headerStop).Split(new string[1] { "||" }, StringSplitOptions.None).Skip(1).ToList();
var records = str.Substring(headerStop + 4).TrimEnd(new char[2] { ' ', '|' }).Split(new string[1] { "| |" }, StringSplitOptions.None).ToList();
var tbl = new DataTable();
headers.ForEach(h => tbl.Columns.Add(h.Trim()));
records.ForEach(r => tbl.Rows.Add(r.Split('|')));

This makes some assumptions but seems to work for your sample data. I'm sure if I worked at I could combine the expressions and clean it up but you'll get the idea.
It will also allow for rows that do not have the same number of cells as the header which I think is something confluence can do.
List<List<string>> table = new List<List<string>>();
var match = Regex.Match(raw, #"(?:(?:\|\|([^|]*))*\n)?");
if (match.Success)
{
var headersWithExtra = match.Groups[1].Captures.Cast<Capture>().Select(c=>c.Value);
List<String> headerRow = headersWithExtra.Take(headersWithExtra.Count()-1).ToList();
if (headerRow.Count > 0)
{
table.Add(headerRow);
}
}
match = Regex.Match(raw + "\r\n", #"[^\n]*\n" + #"(?:\|([^|]*))*");
var cellsWithExtra = match.Groups[1].Captures.Cast<Capture>().Select(c=>c.Value);
List<string> row = new List<string>();
foreach (string cell in cellsWithExtra)
{
if (cell.Trim(' ', '\t') == "\r\n")
{
if (!table.Contains(row) && row.Count > 0)
{
table.Add(row);
}
row = new List<string>();
}
else
{
row.Add(cell);
}
}

This ended up very similar to Jon Tirjan's answer, although it cuts the LINQ to a single statement (the code to replace that last one was horrifically ugly) and is a bit more extensible. For example, it will replace the Confluence line breaks \\ with a string of your choosing, you can choose to trim or not trim whitespace from around elements, etc.
private void ParseWikiTable(string input, string newLineReplacement = " ")
{
string separatorHeader = "||";
string separatorRow = "| |";
string separatorElement = "|";
input = Regex.Replace(input, #"[ \\]{2,}", newLineReplacement);
string inputHeader = input.Substring(0, input.LastIndexOf(separatorHeader));
string inputContent = input.Substring(input.LastIndexOf(separatorHeader) + separatorHeader.Length);
string[] headerArray = SimpleSplit(inputHeader, separatorHeader);
string[][] rowArray = SimpleSplit(inputContent, separatorRow).Select(r => SimpleSplit(r, separatorElement)).ToArray();
// do something with output data
TestPrint(headerArray);
foreach (var r in rowArray) { TestPrint(r); }
}
private string[] SimpleSplit(string input, string separator, bool trimWhitespace = true)
{
input = input.Trim();
if (input.StartsWith(separator)) { input = input.Substring(separator.Length); }
if (input.EndsWith(separator)) { input = input.Substring(0, input.Length - separator.Length); }
string[] segments = input.Split(new string[] { separator }, StringSplitOptions.None);
if (trimWhitespace)
{
for (int i = 0; i < segments.Length; i++)
{
segments[i] = segments[i].Trim();
}
}
return segments;
}
private void TestPrint(string[] lst)
{
string joined = "[" + String.Join("::", lst) + "]";
Console.WriteLine(joined);
}
Console output from your direct input string:
[Owner::Action::Status::Comments]
[Bill::fix the lobby::In Progress::This is eary]
[Joe::fix the bathroom::In progress::plumbing Electric Painting]
[Scott::fix the roof::Complete::this is expensive]

A generic regex solution that populate a datatable and is a little flexible with the syntax.
var text = #"|| Owner|| Action || Status || Comments || | Bill\\ | fix the lobby |In Progress | This is eary| | Joe\\ |fix the bathroom\\ | In progress| plumbing \\Electric \\Painting \\ \\ | | Scott \\ | fix the roof \\ | Complete | this is expensive|";
// Get Headers
var regHeaders = new Regex(#"\|\|\s*(\w[^\|]+)", RegexOptions.Compiled);
var headers = regHeaders.Matches(text);
//Get Rows, based on number of headers columns
var regLinhas = new Regex(String.Format(#"(?:\|\s*(\w[^\|]+)){{{0}}}", headers.Count));
var rows = regLinhas.Matches(text);
var tbl = new DataTable();
foreach (Match header in headers)
{
tbl.Columns.Add(header.Groups[1].Value);
}
foreach (Match row in rows)
{
tbl.Rows.Add(row.Groups[1].Captures.OfType<Capture>().Select(col => col.Value).ToArray());
}

Here's a solution involving regular expressions. It takes a single string as input and returns a List of headers and a List> of rows/columns. It also trims white space, which may or may not be the desired behavior, so be aware of that. It even prints things nicely :)
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace parseWiki
{
class Program
{
static void Main(string[] args)
{
string content = #"|| Owner || Action || Status || Comments || | Bill\\ | fix the lobby |In Progress | This is eary| | Joe\\ |fix the bathroom\\ | In progress| plumbing \\Electric \\Painting \\ \\ | | Scott \\ | fix the roof \\ | Complete | this is expensive|";
content = content.Replace(#"\\", "");
string headerContent = content.Substring(0, content.LastIndexOf("||") + 2);
string cellContent = content.Substring(content.LastIndexOf("||") + 2);
MatchCollection headerMatches = new Regex(#"\|\|([^|]*)(?=\|\|)", RegexOptions.Singleline).Matches(headerContent);
MatchCollection cellMatches = new Regex(#"\|([^|]*)(?=\|)", RegexOptions.Singleline).Matches(cellContent);
List<string> headers = new List<string>();
foreach (Match match in headerMatches)
{
if (match.Groups.Count > 1)
{
headers.Add(match.Groups[1].Value.Trim());
}
}
List<List<string>> body = new List<List<string>>();
List<string> newRow = new List<string>();
foreach (Match match in cellMatches)
{
if (newRow.Count > 0 && newRow.Count % headers.Count == 0)
{
body.Add(newRow);
newRow = new List<string>();
}
else
{
newRow.Add(match.Groups[1].Value.Trim());
}
}
body.Add(newRow);
print(headers, body);
}
static void print(List<string> headers, List<List<string>> body)
{
var CELL_SIZE = 20;
for (int i = 0; i < headers.Count; i++)
{
Console.Write(headers[i].Truncate(CELL_SIZE).PadRight(CELL_SIZE) + " ");
}
Console.WriteLine("\n" + "".PadRight( (CELL_SIZE + 2) * headers.Count, '-'));
for (int r = 0; r < body.Count; r++)
{
List<string> row = body[r];
for (int c = 0; c < row.Count; c++)
{
Console.Write(row[c].Truncate(CELL_SIZE).PadRight(CELL_SIZE) + " ");
}
Console.WriteLine("");
}
Console.WriteLine("\n\n\n");
Console.ReadKey(false);
}
}
public static class StringExt
{
public static string Truncate(this string value, int maxLength)
{
if (string.IsNullOrEmpty(value) || value.Length <= maxLength) return value;
return value.Substring(0, maxLength - 3) + "...";
}
}
}

Read the input string one character at a time and use a state-machine to decide what should be done with each input character. This approach probably needs more code, but it will be easier to maintain and to extend than regular expressions.

How to use Regex to split a string AND include whitespace

I can't seem to find (or write) a simple way of splitting the following sentence into words and assigning a word to the whitespace between the letters.
(VS 2010, C#, .net4.0).
String text = "This is a test.";
Desired result:
[0] = This
[1] = " "
[2] = is
[3] = " "
[4] = a
[5] = " "
[6] = test.
The closest I have come is:
string[] words = Regex.Split(text, #"\s");
but ofcourse, this drops the whitespace.
Suggestions are appreciated. Thanks
Edit: There may be one or more spaces between the words. I would like all spaces between the words to be returned as a "word" itself (with all spaces being placed in that "word"). e.g., if 5 spaces between a word would be.
String spaceword = " "; <--This is not showing correctly, there should be a string of 5 spaces.

Change your pattern to (\s+):
String text = "This is a test.";
string[] words = Regex.Split(text, #"(\s+)");
for(int i =0; i < words.Length;i++)
{
Console.WriteLine(i.ToString() + "," + words[i].Length.ToString() + " = " + words[i]);
}
Here's the output:
0,4 = This
1,8 =
2,2 = is
3,1 =
4,1 = a
5,3 =
6,5 = test.

You can use LINQ to add spaces manually between them:
var parts = text.Split(new[]{ ' ' }, StringSplitOptions.RemoveEmptyEntries);
var result = parts.SelectMany((x,idx) => idx != parts.Length - 1
? new[] { x, " " }
: new[] { x }).ToList();

You can try this regex, \w+|\s+ which uses or operator |
var arr = Regex.Matches(text, #"\S+|\s+").Cast<Match>()
.Select(i => i.Value)
.ToArray();
It just matches both words and spaces and some LINQ stuff is being used so arr is just a String Array

Finding the First Common Substring of a set of strings

I am looking for an implementation of a First Common Substring
Mike is not your average guy. I think you are great.
Jim is not your friend. I think you are great.
Being different is not your fault. I think you are great.
Using a Longest Common Substring implementation (and ignoring punctuation), you would get "I think you are great", but I am looking for the first occurring common substring, in this example:
is not your
Perhaps an implementation that generates and ordered list of all common substrings that I can just take the first from.
Edit
The tokens being compared would be complete words. Looking for a greedy match of the first longest sequence of whole words. (Assuming a suffix tree was used in the approach, each node of the tree would be a word)

There are quite a few steps to do this.
Remove Punctuation
Break down Sentences into list of Words
Create string of all combinations of contiguous words (min:1, max:wordCount)
Join the three lists on new list of string (subsentences)
Sort Accordingly.
Code:
static void Main(string[] args)
{
var sentence1 = "Mike is not your average guy. I think you are great.";
var sentence2 = "Jim is not your friend. I think you are great.";
var sentence3 = "Being different is not your fault. I think you are great.";
//remove all punctuation
// http://stackoverflow.com/questions/421616
sentence1 = new string(
sentence1.Where(c => !char.IsPunctuation(c)).ToArray());
sentence2 = new string(
sentence2.Where(c => !char.IsPunctuation(c)).ToArray());
sentence3 = new string(
sentence3.Where(c => !char.IsPunctuation(c)).ToArray());
//seperate into words
var words1 = sentence1.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).ToList();
var words2 = sentence2.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).ToList();
var words3 = sentence3.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).ToList();
//create substring list
var subSentence1 = CreateSubstrings(words1);
var subSentence2 = CreateSubstrings(words2);
var subSentence3 = CreateSubstrings(words3);
//join then like a Sql Table
var subSentences = subSentence1
.Join(subSentence2,
sub1 => sub1.Value,
sub2 => sub2.Value,
(sub1, sub2) => new { Sub1 = sub1,
Sub2 = sub2 })
.Join(subSentence3,
sub1 => sub1.Sub1.Value,
sub2 => sub2.Value,
(sub1, sub2) => new { Sub1 = sub1.Sub1,
Sub2 = sub1.Sub2,
Sub3 = sub2 })
;
//Sorted by Lowest Index, then by Maximum Words
subSentences = subSentences.OrderBy(s => s.Sub1.Rank)
.ThenByDescending(s => s.Sub1.Length)
.ToList();
//Sort by Maximum Words, then Lowest Index
/*subSentences = subSentences.OrderByDescending(s => s.Sub1.Length)
.ThenBy(s => s.Sub1.Rank)
.ToList();//*/
foreach (var subSentence in subSentences)
{
Console.WriteLine(subSentence.Sub1.Length.ToString() + " "
+ subSentence.Sub1.Value);
Console.WriteLine(subSentence.Sub2.Length.ToString() + " "
+ subSentence.Sub2.Value);
Console.WriteLine(subSentence.Sub3.Length.ToString() + " "
+ subSentence.Sub3.Value);
Console.WriteLine("======================================");
}
Console.ReadKey();
}
//this could probably be done better -Erik
internal static List<SubSentence> CreateSubstrings(List<string> words)
{
var result = new List<SubSentence>();
for (int wordIndex = 0; wordIndex < words.Count; wordIndex++)
{
var sentence = new StringBuilder();
int currentWord = wordIndex;
while (currentWord < words.Count - 1)
{
sentence.Append(words.ElementAt(currentWord));
result.Add(new SubSentence() { Rank = wordIndex,
Value = sentence.ToString(),
Length = currentWord - wordIndex + 1 });
sentence.Append(' ');
currentWord++;
}
sentence.Append(words.Last());
result.Add(new SubSentence() { Rank = wordIndex,
Value = sentence.ToString(),
Length = words.Count - wordIndex });
}
return result;
}
internal class SubSentence
{
public int Rank { get; set; }
public string Value { get; set; }
public int Length { get; set; }
}
Result:
3 is not your
3 is not your
3 is not your
======================================
2 is not
2 is not
2 is not
======================================
1 is
1 is
1 is
======================================
2 not your
2 not your
2 not your
======================================
1 not
1 not
1 not
======================================
1 your
1 your
1 your
======================================
5 I think you are great
5 I think you are great
5 I think you are great
======================================
4 I think you are
4 I think you are
4 I think you are
======================================
3 I think you
3 I think you
3 I think you
======================================
2 I think
2 I think
2 I think
======================================
1 I
1 I
1 I
======================================
4 think you are great
4 think you are great
4 think you are great
======================================
3 think you are
3 think you are
3 think you are
======================================
2 think you
2 think you
2 think you
======================================
1 think
1 think
1 think
======================================
3 you are great
3 you are great
3 you are great
======================================
2 you are
2 you are
2 you are
======================================
1 you
1 you
1 you
======================================
2 are great
2 are great
2 are great
======================================
1 are
1 are
1 are
======================================
1 great
1 great
1 great
======================================

Here's a little something that will do what you want. You would actually adjust to pre-build your list of strings, pass that in and it will find for you... in this example, the phrase will be based of the string with the shortest string as a baseline.
public void SomeOtherFunc()
{
List<string> MyTest = new List<string>();
MyTest.Add( "Mike is not your average guy. I think you are great." );
MyTest.Add( "Jim is not your friend. I think you are great." );
MyTest.Add( "Being different is not your fault. I think you are great." );
string thePhrase = testPhrase( MyTest );
MessageBox.Show( thePhrase );
}
public string testPhrase(List<string> test)
{
// start with the first string and find the shortest.
// if we can't find a short string in a long, we'll never find a long string in short
// Ex "To testing a string that is longer than some other string"
// vs "Im testing a string that is short"
// Work with the shortest string.
string shortest = test[0];
string lastGoodPhrase = "";
string curTest;
int firstMatch = 0;
int lastMatch = 0;
int allFound;
foreach (string s in test)
if (s.Length < shortest.Length)
shortest = s;
// Now, we need to break the shortest string into each "word"
string[] words = shortest.Split( ' ' );
// Now, start with the first word until it is found in ALL phrases
for (int i = 0; i < words.Length; i++)
{
// to prevent finding "this" vs "is"
lastGoodPhrase = " " + words[i] + " ";
allFound = 0;
foreach (string s in test)
{
// always force leading space for string
if ((" "+s).Contains(lastGoodPhrase))
allFound++;
else
// if not found in ANY string, its not found in all, get out
break;
}
if (allFound == test.Count)
{
// we've identified the first matched field, get out for next phase test
firstMatch = i;
// also set the last common word to the same until we can test next...
lastMatch = i;
break;
}
}
// if no match, get out
if (firstMatch == 0)
return "";
// we DO have at least a first match, now keep looking into each subsequent
// word UNTIL we no longer have a match.
for( int i = 1; i < words.Length - firstMatch; i++ )
{
// From where the first entry was, build out the ENTIRE PHRASE
// until the end of the original sting of words and keep building 1 word back
curTest = " ";
for (int j = firstMatch; j <= firstMatch + i; j++)
curTest += words[j] + " ";
// see if all this is found in ALL strings
foreach (string s in test)
// we know we STARTED with a valid found phrase.
// as soon as a string NO LONGER MATCHES the new phrase,
// return the last VALID phrase
if (!(" " + s).Contains(curTest))
return lastGoodPhrase;
// if this is still a good phrase, set IT as the newest
lastGoodPhrase = curTest;
}
return lastGoodPhrase;
}

C# output is somehow program name and class name

I am trying to do the crazy formatting instructions my teacher gave me. After perusing for probably an hour (This is my first C# program), I came up with this line of code.
`Console.WriteLine(String.Format("{0," + -longestTitle + "} | {1," + -longestAlbumTitle + "} | {2," + -longestArtist + "} | {3:0.00, 8} | {4," + -longestYearAndRating + "} |", songArray[arraySearcher].title, songArray[arraySearcher].albumTitle, songArray[arraySearcher].artist, songArray[arraySearcher].length, songArray[arraySearcher].yearAndRating));`
longestX is an int containing the number of characters of the longestX (where x = title, album, etc).
The output I would like looks something like this:
Stuff | morestuff | extrastuff | 5.92 | 1992:R |
Stuf | est | sfafe | 232.44 | 2001:PG |
S uf | e | sfe | .44 | 2001:G |
(Where all padding is determined dynamically based on the longest title input by the user or file).
The output I get looks like this:
Program_Example.ClassName
Program_Example.ClassName
(or, specifically, Tyler_Music_Go.Song)
I have printed songArray[arraySearcher].title in this same method, and it works fine.
Could someone please help me?
Full relevant code:
class Song {
public string title, albumTitle, yearAndRating, artist;
public float length;
public Song(string titl, string albumTitl, string art, float leng, string yrNRating)
{
title = titl;
albumTitle = albumTitl;
yearAndRating = yrNRating;
length = leng;
artist = art;
}
}
//This class contains a Song array (with all Songs contained within), an array index, a search index, and ints to determine the longest of each category.
class SongList
{
Song[] songArray;
private int arrayKeeper, longestTitle, longestArtist, longestAlbumTitle, longestYearAndRating, checker;
int arraySearcher = 0;
public SongList()
{
songArray = new Song[10000];
arrayKeeper = 0;
longestTitle = 0;
longestArtist = 0;
longestAlbumTitle = 0;
longestYearAndRating = 0;
}
public void AddSong(string title, string albumTitle, string artist, float length, string yearAndRating)
{
songArray[arrayKeeper] = new Song(title, albumTitle, artist, length, yearAndRating);
arrayKeeper++;
checker = 0;
//This section of code is responsible for formatting the output. Since the longest values are already known, the list can be displayed quickly.
//Once a song is deleted, however, previously calculated longest lengths still stand.
foreach (char check in title)
{
checker++;
}
if (checker > longestTitle)
{
longestTitle = checker;
}
foreach (char check in albumTitle)
{
checker++;
}
if (checker > longestAlbumTitle)
{
longestAlbumTitle = checker;
}
foreach (char check in artist)
{
checker++;
}
if (checker > longestArtist)
{
longestArtist = checker;
}
foreach (char check in yearAndRating)
{
checker++;
}
if (checker > longestYearAndRating)
{
longestYearAndRating = checker;
}
}
//public bool RemoveSong(string title)
// {
//}
public void DisplayData()
{
Console.WriteLine("| Title | Album Title | Artist | Length | Year and Rating |");
for (arraySearcher = 0; arraySearcher < arrayKeeper; arraySearcher++)
{
//This line for testing purposes. (works)
Console.WriteLine(songArray[arraySearcher].title);
Console.WriteLine(songArray[arraySearcher].ToString());
}
}
public override string ToString()
{
//This line for testing purposes. (works)
Console.WriteLine(songArray[arraySearcher].title);
return String.Format("{0," + -longestTitle + "} | {1," + -longestAlbumTitle + "} | {2," + -longestArtist + "} | {3:0.00, 8} | {4," + -longestYearAndRating + "} |", songArray[arraySearcher].title, songArray[arraySearcher].albumTitle, songArray[arraySearcher].artist, songArray[arraySearcher].length, songArray[arraySearcher].yearAndRating);
}
}
`
EDIT:
Well, now I feel all manor of stupid. I was overwriting the tostring() method for the SongList, and then calling the tostring method for Song. Guy who answered made me realize it. Thanks to everyone who gave me advice, though.

You have to either access a property directly (songVariable.Title) or override ToString() in your song class to have that output the title.
public class Song
{
public string Title {get; set;}
public override string ToString()
{
return Title;
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String to Array, Sort by 3rd Word/Column - c#

You can use an Array.Sort overload that takes the custom comparer: public class MyComparer : IComparer { int IComparer.Compare( Object x, Object y ) { //compare last parts here } }

Related

String split with specified string without delimeter

In C#, what is the best way to parse this WIKI markup?

How to use Regex to split a string AND include whitespace

Finding the First Common Substring of a set of strings

C# output is somehow program name and class name

Categories

Resources