Parse Text Row with Empty Spaces

Parse Text Row with Empty Spaces - c#

I have a file, the text format is like this:
.640 .070 -.390 -.740 -1.030 -1.410 -1.780 -1.840
-1.360 -.360 .860 1.880 2.340 2.250 1.950 1.710
1.410 .700 -.300 -.840 -.280 1.020 1.860 1.460
.310 -.460 -.320 .350 1.020 1.650 2.430 3.070
2.840 1.440 -.460 -1.650 -1.520 -.520 .250 .190
-.420 -.870 -.800 -.280 .570 1.660 2.500 2.220
.520 -1.560 -2.530 -2.030 -1.200 -1.060 -1.230 -.600
.990 2.300 2.180 .940 -.090 -.140 .320 .470
.330 .420 .830 1.080 1.090 1.530 2.740 3.800
3.410 1.610 -.150 -.900 -1.120 -1.640 -2.140 -1.590
.210 2.210 3.290 3.170 2.380 1.880 2.530 4.210
5.280 3.820 -.040 -3.670 -4.190 -1.260 2.930 5.740
5.980 3.920 .540 -2.890 -5.010 -4.780 -2.150 1.640
4.670 5.540 4.230 1.950 .120 -.470 -.010 .340
-.710 -2.940 -4.070 -1.810 3.000 6.590 6.140 2.750
-.490 -2.460 -4.180 -5.660 -4.800 -.560 4.510 6.630
5.140 2.860 2.230 2.510 1.670 -.440 -2.030 -2.330
Note that there are a lot of white characters between one value and another.
I tried to read each line, and then split the line according to a ' ' character. My code is something like this:
public List<double> Parse(StreamReader sr)
{
var dataList = new List<double>();
while (sr.Peek() >= 0)
{
string line = sr.ReadLine();
if (lineCount > 1)
{
string[] columns = line.Split(' ');
for (var j = 0; j < columns.Length; j++)
{
dataList.Add(double.Parse(columns[j]) ));
}
}
}
return dataList ;
}
The problem with the above code is that it is only able to handle the case where values are separated by a single white character.
Any idea ?

The simplest way is probably to use an overload of String.Split which includes a StringSplitOptions parameter, and specify StringSplitOptions.RemoveEmptyEntries.
I would also personally just call ReadLine until that returned null, rather than using TextReader.Peek. Aside from anything else, it's more general - it will work even if the underlying stream (if any) doesn't support seeking.

Before you do the split, replace all multi spaces with a single space, something like:
line = System.Text.RegularExpressions.Regex.Replace(line, #" +", #" ");

You may use the simple one line code for this. Let your text is in the string named input.
string[] values = System.Text.RegularExpressions.Regex.Split(input, #"\s+");
You will get all values in a string array simply

Related

How to replace string that's only before a symbol

Doing practice with a note app, I have a text file that contain notes by line numbering like this:
1) First note
2) Second note
n) n note
Anticipating that a user may remove a note from the list, I want to avoid having these notes un-organized, so the lines numbers would be re-organized automatically.
int NumérotationFromTheUserTextFile, NumérotationInOrder = 1;
string[] Strings = File.ReadAllLines(logPath);
for(int i=0; i<Strings.Length; i++)
{
using (TextReader reader = File.OpenText(logPath))
{
string text = File.ReadLines(logPath).Skip(i).Take(1).First();
string[] bits = text.Split(')');
if(int.TryParse(bits[0], out int x))
{
NumérotationFromTheUserTextFile = x;
if (NumérotationFromTheUserTextFile != NumérotationInOrder)
{
Strings[i] = Strings[i].Replace(NumérotationFromTheUserTextFile.ToString(CultureInfo.InvariantCulture), NumérotationInOrder.ToString(CultureInfo.InvariantCulture));
}
NumérotationInOrder++;
}
}
}
File.WriteAllLines(logPath, Strings);
The code above does the job EXCEPT it does affect/change all other similar numbers at the same line. and I want it to stop editing at each line by this symbol ')'.
Any suggestion on how to improve it would be appreciated.

As I mentioned in the comments, it seems like a design flaw to store the line number with the line itself. That's something you can determine at runtime when you load the note, and you can display them in your UI without writing them to the file. Since it's obvious that the user has access to modify the file outside of your application, the fewer "rules" you have about the contents of each line the better.
However, if you need to store '#) at the start of each line, then we need to validate a couple of things:
If the line does not contain a ')' character, add #) to the beginning of the line
If the line doesn't start with numeric text, add #) to the beginning of the line
If the number before the ')' character is not correct, modify it.
Here's one way to do it:
public static void ReNumberLines(string filePath)
{
if (!File.Exists(filePath)) throw new FileNotFoundException(nameof(filePath));
var lines = File.ReadAllLines(filePath);
for (int i = 0; i < lines.Length; i++)
{
var num = i + 1;
var parts = lines[i].Split(new[] {')'}, 2); // Split on the first ')'
// If there was no ')' character, add a line number to this line
if (parts.Length == 1)
{
lines[i] = $"{num}) {parts[0].Trim()}";
continue;
}
// Try to determine if we need to modify the beginning of the line.
// If it starts with non-numeric characters, add '#)'
// Otherwise if the number doesn't match, replace it with the correct #
int number;
if (int.TryParse(parts[0].Trim(), out number))
{
// There was a number before the ')', so if it matches we can continue
if (number == num) continue;
// Otherwise replace it with the correct number
lines[i] = $"{num}) {parts[1].Trim()}";
}
else
{
// The characters before the ')' are not numeric, so we have to assume that
// user content is here and prefix this line with the correct number and ')'
lines[i] = $"{num}) {parts[0]}){parts[1]}";
}
}
File.WriteAllLines(filePath, lines);
}

PigLatin how can I strip punctuation from a string? And Then add it back?

Working on program for class call pig Latin. It works for what I need for class. It ask just to type in a phase to convert. But I notice if I type a sentence with punctuation at the end it will mess up the last word translation. Trying to figure out the best way to fix this. New at programming but I would need away for it to check last character in word to check for punctuations. Remove it before translation and then add it back. Not sure how to do that. Been reading about char.IsPunctuation. Plus not sure what part of my code I would had for that check.
public static string MakePigLatin(string str)
{
string[] words = str.Split(' ');
str = String.Empty;
for (int i = 0; i < words.Length; i++)
{
if (words[i].Length <= 1) continue;
string pigTrans = new String(words[i].ToCharArray());
pigTrans = pigTrans.Substring(1, pigTrans.Length - 1) + pigTrans.Substring(0, 1) + "ay ";
str += pigTrans;
}
return str.Trim();
}

The following should get you strings of letters for converting while passing through any non-letter characters that follow them.
Splitter based on Splitting a string in C#
public static string MakePigLatin(string str) {
MatchCollection matches = Regex.Matches(str, #"([a-zA-Z]*)([^a-zA-Z]*)");
StringBuilder result = new StringBuilder(str.Length * 2);
for (int i = 0; i < matches.Count; ++i) {
string pigTrans = matches[i].Groups[1].Captures[0].Value ?? string.Empty;
if (pigTrans.Length > 1) {
pigTrans = pigTrans.Substring(1) + pigTrans.Substring(0, 1) + "ay";
}
result.Append(pigTrans).Append(matches[i].Groups[2].Captures[0].Value);
}
return result.ToString();
}
The matches variable should contain all the match collections of 2 groups. The first group will be 0 or more letters to translate followed by a second group of 0 or more non-letters to pass through. The StringBuilder should be more memory efficient than concatenating System.String values. I gave it a starting allocation of double the initial string size just to avoid having to double the allocated space. If memory is tight, maybe 1.25 or 1.5 instead of 2 would be better, but you'd probably have to convert it back to int after. I took the length calculation off your Substring call because leaving it out grabs everything to the end of the string already.

Parse Text File Into Dictionary

I have a text file that has several hundred configuration values. The general format of the configuration data is "Label:Value". Using C# .net, I would like to read these configurations, and use the Values in other portions of the code. My first thought is that I would use a string search to look for the Labels then parse out the values following the labels and add them to a dictionary, but this seems rather tedious considering the number of labels/values that I would have to search for. I am interested to hear some thoughts on a possible architecture to perform this task. I have included a small section of a sample text file that contains some of the labels and values (below). A couple of notes: The Values are not always numeric (as seen in the AUX Serial Number); For whatever reason, the text files were formatted using spaces (\s) rather than tabs (\t). Thanks in advance for any time you spend thinking about this.
Sample Text:
AUX Serial Number: 445P000023 AUX Hardware Rev: 1
Barometric Pressure Slope: -1.452153E-02
Barometric Pressure Intercept: 9.524336E+02

This is a nice little brain tickler. I think this code might be able to point you in the right direction. Keep in mind, this fills a Dictionary<string, string>, so there are no conversions of values into ints or the like. Also, please excuse the mess (and the poor naming conventions). It was a quick write-up based on my train of thought.
Dictionary<string, string> allTheThings = new Dictionary<string, string>();
public void ReadIt()
{
// Open the file into a streamreader
using (System.IO.StreamReader sr = new System.IO.StreamReader("text_path_here.txt"))
{
while (!sr.EndOfStream) // Keep reading until we get to the end
{
string splitMe = sr.ReadLine();
string[] bananaSplits = splitMe.Split(new char[] { ':' }); //Split at the colons
if (bananaSplits.Length < 2) // If we get less than 2 results, discard them
continue;
else if (bananaSplits.Length == 2) // Easy part. If there are 2 results, add them to the dictionary
allTheThings.Add(bananaSplits[0].Trim(), bananaSplits[1].Trim());
else if (bananaSplits.Length > 2)
SplitItGood(splitMe, allTheThings); // Hard part. If there are more than 2 results, use the method below.
}
}
}
public void SplitItGood(string stringInput, Dictionary<string, string> dictInput)
{
StringBuilder sb = new StringBuilder();
List<string> fish = new List<string>(); // This list will hold the keys and values as we find them
bool hasFirstValue = false;
foreach (char c in stringInput) // Iterate through each character in the input
{
if (c != ':') // Keep building the string until we reach a colon
sb.Append(c);
else if (c == ':' && !hasFirstValue)
{
fish.Add(sb.ToString().Trim());
sb.Clear();
hasFirstValue = true;
}
else if (c == ':' && hasFirstValue)
{
// Below, the StringBuilder currently has something like this:
// " 235235 Some Text Here"
// We trim the leading whitespace, then split at the first sign of a double space
string[] bananaSplit = sb.ToString()
.Trim()
.Split(new string[] { " " },
StringSplitOptions.RemoveEmptyEntries);
// Add both results to the list
fish.Add(bananaSplit[0].Trim());
fish.Add(bananaSplit[1].Trim());
sb.Clear();
}
}
fish.Add(sb.ToString().Trim()); // Add the last result to the list
for (int i = 0; i < fish.Count; i += 2)
{
// This for loop assumes that the amount of keys and values added together
// is an even number. If it comes out odd, then one of the lines on the input
// text file wasn't parsed correctly or wasn't generated correctly.
dictInput.Add(fish[i], fish[i + 1]);
}
}

So the only general approach that I can think of, given the format that you're limited to, is to first find the first colon on the line and take everything before it as the label. Skip all whilespace characters until you get to the first non-whitespace character. Take all non-whitespace characters as the value of the label. If there is a colon after the end of that value take everything after the end of the previous value to the colon as the next value and repeat. You'll also probably need to trim whitespace around the labels.
You might be able to capture that meaning with a regex, but it wouldn't likely be a pretty one if you could; I'd avoid it for something this complex unless you're entire development team is very proficient with them.

I would try something like this:
While string contains triple space, replace it with double space.
Replace all ": " and ": " (: with double space) with ":".
Replace all " " (double space) with '\n' (new line).
If line don't contain ':' than skip the line. Else, use string.Split(':'). This way you receive arrays of 2 strings (key and value). Some of them may contain empty characters at the beginning or at the end.
Use string.Trim() to get rid of those empty characters.
Add received key and value to Dictionary.
I am not sure if it solves all your cases but it's a general clue how I would try to do it.
If it works you could think about performance (use StringBuilder instead of string wherever it is possible etc.).

This is probably the dirtiest function I´ve ever written, but it works.
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
string[] data = line.Split(':');
if (line != String.Empty)
{
for (int i = 0; i < data.Length - 1; i++)
{
if (i != 0)
{
bool isPair;
if (i % 2 == 0)
{
isPair = true;
}
else
{
isPair = false;
}
if (isPair)
{
string keyOdd = data[i].Trim();
try { keyOdd = keyOdd.Substring(keyOdd.IndexOf(' ')).TrimStart(); }
catch { }
string valueOdd = data[i + 1].TrimStart();
try { valueOdd = valueOdd.Remove(valueOdd.IndexOf(' ')); } catch{}
yourDic.Add(keyOdd, valueOdd);
}
else
{
string keyPair = data[i].TrimStart();
keyPair = keyPair.Substring(keyPair.IndexOf(' ')).Trim();
string valuePair = data[i + 1].TrimStart();
try { valuePair = valuePair.Remove(valuePair.IndexOf(' ')); } catch { }
yourDic.Add(keyPair, valuePair);
}
}
else
{
string key = data[i].Trim();
string value = data[i + 1].TrimStart();
try { value = value.Remove(value.IndexOf(' ')); } catch{}
yourDic.Add(key, value);
}
}
}
}
How does it works?, well splitting the line you can know what you can get in every position of the array, so I just play with the even and odd values.
You will understand me when you debug this function :D. It fills the Dictionary that you need.

I have another idea. Does values contain spaces? If not you could do like this:
Ignore white spaces until you read some other char (first char of key).
Read string until ':' occures.
Trim key that you get.
Ignore white spaces until you read some other char (first char of value).
Read until you get empty char.
Trim value that you get.
If it is the end than stop. Else, go back to step 1.
Good luck.

Maybe something like this would work, be careful with the ':' character
StreamReader reader = new StreamReader("c:/yourFile.txt");
Dictionary<string, string> yourDic = new Dictionary<string, string>();
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
yourDic.Add(line.Split(':')[0], line.Split(':')[1]);
}
Anyway, I recommend to organize that file in some way that you´ll always know in what format it comes.

How to remove a duplicate set of characters in a string

For example a string contains the following (the string is variable):
http://www.google.comhttp://www.google.com
What would be the most efficient way of removing the duplicate url here - e.g. output would be:
http://www.google.com

I assume that input contains only urls.
string input = "http://www.google.comhttp://www.google.com";
// this will get you distinct URLs but without "http://" at the beginning
IEnumerable<string> distinctAddresses = input
.Split(new[] {"http://"}, StringSplitOptions.RemoveEmptyEntries)
.Distinct();
StringBuilder output = new StringBuilder();
foreach (string distinctAddress in distinctAddresses)
{
// when building the output, insert "http://" before each address so
// that it resembles the original
output.Append("http://");
output.Append(distinctAddress);
}
Console.WriteLine(output);

Efficiency has various definitions: code size, total execution time, CPU usage, space usage, time to write the code, etc. If you want to be "efficient", you should know which one of these you're trying for.
I'd do something like this:
string url = "http://www.google.comhttp://www.google.com";
if (url.Length % 2 == 0)
{
string secondHalf = url.Substring(url.Length / 2);
if (url.StartsWith(secondHalf))
{
url = secondHalf;
}
}
Depending on the kinds of duplicates you need to remove, this may or may not work for you.

collect strings into list and use distinct, if your string has http address you can apply regex http:.+?(?=((http:)|($)) with RegexOptions.SingleLine
var distinctList = list.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();

Given you don't know the length of the string, you don't know if something is double and you don't know what is double:
string yourprimarystring = "http://www.google.comhttp://www.google.com";
int firstCharacter;
string temp;
for(int i = 0; i <= yourprimarystring.length; i++)
{
for(int j = 0; j <= yourprimarystring.length; j++)
{
string search = yourprimarystring.substring(i,j);
firstCharacter = yourprimaryString.IndexOf(search);
if(firstCharacter != -1)
{
temp = yourprimarystring.substring(0,firstCharacter) + yourprimarystring.substring(firstCharacter + j - i,yourprimarystring.length)
yourprimarystring = temp;
}
}
This itterates through all your elements, takes all out from first to last letter and searches for them like this:
ABCDA - searches for A finds A exludes A, thats the problem, you need to specify how long the duplication needs to be if you want to make it variable, but maybe my code helps you.

How to split a string while preserving line endings?

I have a block of text and I want to get its lines without losing the \r and \n at the end. Right now, I have the following (suboptimal code):
string[] lines = tbIn.Text.Split('\n')
.Select(t => t.Replace("\r", "\r\n")).ToArray();
So I'm wondering - is there a better way to do it?
Accepted answer
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");

The following seems to do the job:
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
(?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.
(?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.

Something along the lines of using this regular expression:
[^\n\r]*\r\n
Then use Regex.Matches().
The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)

Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:
string[] lines = tbIn.Text.Split('\n');
for (int i = 0; i < lines.Length; ++i)
{
lines[i] = lines[i].Replace("\r", "\r\n");
}
... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.
One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?

If you are just going to replace the newline (\n) then do something like this:
string[] lines = tbIn.Text.Split('\n')
.Select(t => t + "\r\n").ToArray();
Edit: Regex.Replace allows you to split on a string.
string[] lines = Regex.Split(tbIn.Text, "\r\n")
.Select(t => t + "\r\n").ToArray();

As always, extension method goodies :)
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
usage:
string text = "One,Two,Three,Four";
foreach (var s in text.SplitAndKeep(","))
{
Console.WriteLine(s);
}
Output:
One,
Two,
Three,
Four

You can achieve this with a regular expression. Here's an extension method with it:
public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
{
MatchCollection matches = Regex.Matches(input, #"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
string[] result = new string[matches.Count];
for (int i = 0; i < matches.Count ; i++)
{
result[i] = matches[i].Value;
}
return result;
}
I'm not sure if this is a better solution. Yours is very compact and simple.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parse Text Row with Empty Spaces - c#

Before you do the split, replace all multi spaces with a single space, something like: line = System.Text.RegularExpressions.Regex.Replace(line, #" +", #" ");

You may use the simple one line code for this. Let your text is in the string named input. string[] values = System.Text.RegularExpressions.Regex.Split(input, #"\s+"); You will get all values in a string array simply

Related

How to replace string that's only before a symbol

PigLatin how can I strip punctuation from a string? And Then add it back?

Parse Text File Into Dictionary

How to remove a duplicate set of characters in a string

How to split a string while preserving line endings?

Categories

Resources