Complex string split C# - c#

I have input file like this:
input.txt
aa#aa.com bb#bb.com "Information" "Hi there"
cc#cc.com dd#dd.com "Follow up" "Interview"
I have used this method:
string[] words = item.Split(' ');
However, it splits every words with space. I also have spaces in quotes strings but I won't split those spaces.
Basically I want to parse this input from file to this output:
From = aa#aa.com
To = bb#bb.com
Subject = Information
Body = Hi there
How do I split these strings in C#?

Simply you can use Regex as it is said in this question
var stringValue = "aa#aa.com bb#bb.com \"Information\" \"Hi there\"";
var parts = Regex.Matches(stringValue, #"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
//parts: aa#aa.com
bb#bb.com
"Information"
"Hi there"
Also you may try Replace function to remove those " characters.

The String.Split() method has an overload that allows you to specify the number of splits required. You can get what you want like this:
Read one line at a time
Call input.Split(new string[" "], 3, StringSplitOptions.None) - this returns an array of strings with 3 parts. Since email addresses don't have spaces in them, the first two strings will be the from/to addresses, and the third string will be the subject and message. Assume the result of this call is stored in firstSplit[], then firstSplit[0] is the from address, firstSplit[1] is the to address, and firstSplit[2] is the subject and message combined.
Call firstSplit[2].Split(new string[""" """], 2, StringSplitOptions.None) - this searches for the string " " in the concatenated subject+message from the previous call, which should pinpoint the separator between the end of the subject and the start of the message. This will give you the subject and message in another array. (The double-quotes inside are doubled to escape them)
This assumes you disallow double quotes in your subject and message. If you do allow double quotes, then you need to ensure you escape them before putting it in the file in the first place.

You can do this without using regex by just using IndexOf and SubString just put it in a loop if you have multiple emails to parse.
It's not pretty but it would be faster than RegEx if you're doing a lot of them.
string content = #"abba#aa.com dddb#bdd.com ""Information"" ""Hi there""";
string firstEmail = content.Substring(0, content.IndexOf(" ", StringComparison.Ordinal));
string secondEmail = content.Substring(firstEmail.Length, content.IndexOf(" ", firstEmail.Length + 1) - firstEmail.Length);
int firstQuote = content.IndexOf("\"", StringComparison.Ordinal);
string subjectandMessage = content.Substring(firstQuote, content.Length - content.IndexOf("\"", firstQuote, StringComparison.Ordinal));
String[] words = subjectandMessage.Split(new string[] { "\" \"" }, StringSplitOptions.None);
Console.WriteLine(firstEmail);
Console.WriteLine(secondEmail);
Console.WriteLine(words[0].Remove(0,1));
Console.WriteLine(words[1].Remove(words[1].Length -1));
Output:
aa#aa.com
bb#bb.com
Information
Hi there

As Spencer pointed out, read this file line by line using File.ReadAllLines() method and then apply String.Split[] method with spaces using something like this:
string[] elements = string.Split(new char[0]);
UPDATE
Not a pretty solution, but this is how I think it can work:
string[] readText = File.ReadAllLines(' ');
//Take value of first 3 fields by simple readText[index]; (index: 0-2)
string temp = "";
for(int i=3; i<readText.Length; i++)
{
temp += readText[i];
}

Requires reference to Microsoft.VisualBasic, but a bit more reliable than Regex:
using (var tfp = new Microsoft.VisualBasic.FileIO.TextFieldParser("input.txt")) {
for (tfp.SetDelimiters(" "); !tfp.EndOfData;) {
string[] fields = tfp.ReadFields();
Debug.Print(string.Join(",", fields)); // "aa#aa.com,bb#bb.com,Information,Hi there"
}
}

Related

How can I remove the spaces that appear between the words even after splitting the string? [duplicate]

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?
String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.
You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"
I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.
txt = txt.Trim();
Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.
text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();
Use the Trim method.
static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.
You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

C# Coding using strings in order to find a full name

I am using Visual studio to create a C# windows form that helps me find the suffix,first and last name of the user. I am using string.split to find the first space and split from there but it only gives me from the first space onward. if the user input " Mr. Donald duck " I can not manage to make it work in the situation.
"Mr. -5 spaces- Donald -5spaces- Duck"
the code doesn't read past the first space.
any suggestions?
Trimming is only going to take care of leading and trailing white-space characters. Here's what you need in order to get just the 3 useful parts of the text when you have all those extra spaces between words:
string name = "Mr. Donald Duck";
string[] split = name.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
The string array will contain 3 items: Mr., Donald, and Duck. The StringSplitOptions.RemoveEmptyEntries will take care of repeating white-space when you split the original string.
Without it, you get something like this: Mr., , , , , Donald, , , , , Duck
You should always use String.Trim() function. (To remove leading and trailing white-space from string) when you deal with user input as a string.
string s = " Mr. Donald duck ";
// Split string on spaces.
// ... This will separate all the words.
string[] words = s.Trim().Split(' ');
//.....check size of array.
if(words.Length ==3)
{
string suffix=words[0];
string firstname=words[1];
string lastname=words[2];
}
I am not getting -5 in your question but hope this will help.
Split with remove empty string option, then you will get non empty word array as result. From that you can get name parts.
Demo
The Syntax for String.Split would be like this:
// 0 1 2
// ooo|oooooo|oooo
string str = "Mr. Donald Duck";
string suffix = str.Split(' ')[0];
string fname = str.Split(' ')[1];
string lname = str.Split(' ')[0];
Just for explanation
According to MSDN You can easily remove white spaces from both ends of a string by using the String.Trim method. You can read it here. For more good understanding you can visit here
string input = Console.ReadLine();
// This will remove white spaces from both ends and split on the basis of spaces in string.
string[] tokens = input.Trim().Split(' ');
string title = tokens[0];
string firstname = tokens[1];
string secondname = tokens[2];

Splitting string on multi-character delimeter

string Idstr="ID03I010102010210AEMPD4677EID03I020102020208L8159734ID03I030102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201";
string[] stringSeparators = new string[] { "ID03I0" };
string[] result;
result = IdStr.Split(stringSeparators, StringSplitOptions.RemoveEmptyEntries);
This is the result:
result[0]=10102010210AEMPD4677E
result[1]=20102020208L8159734
result[3]=30102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201
Desired result:
result[0]=ID03I010102010210AEMPD4677E
result[1]=ID03I020102020208L8159734
result[3]=ID03I030102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201
As you can see I want to include delimiter ID03I0 to the elements.
NOTE: I know I can include it by hardcoding it. But that's not the way I want to do it.
result = IdStr.Split(stringSeparators, StringSplitOptions.RemoveEmptyEntries)
.Select(x => stringSeparators[0] + x).ToArray();
This adds the seperator to the beginning at every element within your array.
EDIT: Unfortunately with this approach you are limited to use just one single delimiter. So if you want to add more you´d use Regex instead.
Following Regex pattern should work.
string input = "ID03I010102010210AEMPD4677EID03I020102020208L8159734ID03I030102030210IPS1406974PT03T010109981815938030202PT03T0201109899488666030201PT03T0301109818159381030203PT03T040112919818159381030201";
string delimiter = "ID03I0";//Modify it as you need
string pattern = string.Format("(?<=.)(?={0})", delimiter);
string[] result = Regex.Split(input, pattern);
Online Demo
Adapted from this answer.

C# Remove Part of a String with only knowledge of start and end of the part

This is actual example of what I want to accomplish:
I have this string :
File Name="Unstuck20140608124131432.txt"
Path="Unstuck20140608124131432.txt" Status="Passed" Duration="0.44"
And i want to cut the "Path" attribute from it, so it will look like this:
File Name="Unstuck20140608124131432.txt" Status="Passed"
Duration="0.44"
I don't know nothing about the length of the path or the characters inside the " " of the path.
How can i accomplish it ?
You can use Regex.Replace
string input = #"File Name=""Unstuck20140608124131432.txt"" Path=""Unstuck20140608124131432.txt"" Status=""Passed"" Duration=""0.44""";
var output = Regex.Replace(input, #"Path=\"".+?\""", "");
And for you non-regex fans out there, you can use the split command. (Nothing against regex. It is an important part of a balanced programmer diet.)
var input = "File Name=\"Unstuck20140608124131432.txt\" Path=\"Unstuck20140608124131432.txt\" Status=\"Passed\" Duration=\"0.44\"";
var tmp = input.Split(new[] { "Path=\"" }, 2, StringSplitOptions.None);
var result = tmp[0] + tmp[1].Split(new[] { '"' }, 2)[1];
Split the string into 2 parts based on the start of your pattern (Path=").
Take the first part.
Split the 2nd part into 2 parts based on the end of the pattern ("). Take the 2nd part of that.

How to indicate whitespaces while reading from a .txt file

I have a simple .txt file with X,Y-values in it. It is structured like this:
-25.7754 35.87
-22.1233 32.16
-20.361 30.75
etc.
I am able to read single lines or the whole text to the end, with objstream.ReadToEnd(); & objstream.ReadLine().
But here's my question how could I indicate when the String after the first value ends so I can save/parse it to float & proceed reading the value of the next string?
Here is the read functionality I have so far :)
StreamReader objStream = new StreamReader("C:blablabla\\Text.asc");
textBox1.Text = objStream.ReadLine();
Thanks in advance,
BC++
Use String.split()
As requested, an example :
string s = "there is a cat";
//
// Split string on spaces.
// ... This will separate all the words.
//
string[] words = s.Split(' ');
foreach (string word in words)
{
Console.WriteLine(word);
}
The output is :
there
is
a
cat
Look at the string.Split methods:
var line1 = objStream.ReadLine();
var lineParts = line1.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
textBox1.Text = lineParts[0];
textBox2.Text = lineParts[1];
Note the use of an overload that uses StringSplitOptions.RemoveEmptyEntries - the means that if you have multiple spaces in succession, the result will not contain empty entries.
If you really mean white-space and not space then you have to go this way:
string line = "-25.7754 35.87";
string[] values = line.Split(new char[] { }, StringSplitOptions.RemoveEmptyEntries);
The difference from the other answers in the splitting character. If this not defined then white-space characters are assumed to be the delimiters. In other words you will get the same result for
string line = "-25.7754\t35.87"; // tab instead of spaces.
You will have the flexibility to split correctly fixed length or tab delimited lines using the same code.

Categories