C# string.split() separate string by uppercase - c#

I've been using the Split() method to split strings. But this work if you set some character for condition in string.Split(). Is there any way to split a string when is see Uppercase?
Is it possible to get few words from some not separated string like:
DeleteSensorFromTemplate
And the result string is to be like:
Delete Sensor From Template

Use Regex.split
string[] split = Regex.Split(str, #"(?<!^)(?=[A-Z])");

Another way with regex:
public static string SplitCamelCase(string input)
{
return System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", " $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim();
}

If you do not like RegEx and you really just want to insert the missing spaces, this will do the job too:
public static string InsertSpaceBeforeUpperCase(this string str)
{
var sb = new StringBuilder();
char previousChar = char.MinValue; // Unicode '\0'
foreach (char c in str)
{
if (char.IsUpper(c))
{
// If not the first character and previous character is not a space, insert a space before uppercase
if (sb.Length != 0 && previousChar != ' ')
{
sb.Append(' ');
}
}
sb.Append(c);
previousChar = c;
}
return sb.ToString();
}

I had some fun with this one and came up with a function that splits by case, as well as groups together caps (it assumes title case for whatever follows) and digits.
Examples:
Input -> "TodayIUpdated32UPCCodes"
Output -> "Today I Updated 32 UPC Codes"
Code (please excuse the funky symbols I use)...
public string[] SplitByCase(this string s) {
var ʀ = new List<string>();
var ᴛ = new StringBuilder();
var previous = SplitByCaseModes.None;
foreach(var ɪ in s) {
SplitByCaseModes mode_ɪ;
if(string.IsNullOrWhiteSpace(ɪ.ToString())) {
mode_ɪ = SplitByCaseModes.WhiteSpace;
} else if("0123456789".Contains(ɪ)) {
mode_ɪ = SplitByCaseModes.Digit;
} else if(ɪ == ɪ.ToString().ToUpper()[0]) {
mode_ɪ = SplitByCaseModes.UpperCase;
} else {
mode_ɪ = SplitByCaseModes.LowerCase;
}
if((previous == SplitByCaseModes.None) || (previous == mode_ɪ)) {
ᴛ.Append(ɪ);
} else if((previous == SplitByCaseModes.UpperCase) && (mode_ɪ == SplitByCaseModes.LowerCase)) {
if(ᴛ.Length > 1) {
ʀ.Add(ᴛ.ToString().Substring(0, ᴛ.Length - 1));
ᴛ.Remove(0, ᴛ.Length - 1);
}
ᴛ.Append(ɪ);
} else {
ʀ.Add(ᴛ.ToString());
ᴛ.Clear();
ᴛ.Append(ɪ);
}
previous = mode_ɪ;
}
if(ᴛ.Length != 0) ʀ.Add(ᴛ.ToString());
return ʀ.ToArray();
}
private enum SplitByCaseModes { None, WhiteSpace, Digit, UpperCase, LowerCase }

Here's another different way if you don't want to be using string builders or RegEx, which are totally acceptable answers. I just want to offer a different solution:
string Split(string input)
{
string result = "";
for (int i = 0; i < input.Length; i++)
{
if (char.IsUpper(input[i]))
{
result += ' ';
}
result += input[i];
}
return result.Trim();
}

Related

How can I trim strings down to the first occurrence of a ";" or a "["

I have c# strings that look like this:
"かたむく;かたぶく[ok]"
"そば[側,傍];そく[側];はた"
"くすり"
"おととい[gikun];おとつい[gikun];いっさくじつ"
How can I trim these down so that the output has only text up to the first occurrence of a ";" character (not a normal semicolon) or a "[" or if neither are present then the new string would be the same as the existing.
"かたむく"
"そば"
"くすり"
"おととい"
Is that something that would be best done with Regex or should I use some indexOf type of code to do this?
You don't need a Regex, just string.IndexOfAny. Something like:
var inputs = new[]
{
"かたむく;かたぶく[ok]",
"そば[側,傍];そく[側];はた",
"くすり",
"おととい[gikun];おとつい[gikun];いっさくじつ"
};
var separators = new[] {' ', '['};
foreach (var input in inputs)
{
var separatorPosition = input.IndexOfAny(separators);
if (separatorPosition >= 0)
{
Debug.WriteLine($"Split: {input.Substring(0, separatorPosition)}");
}
else
{
Debug.WriteLine($"No Split: {input}");
}
}
I get the following output from your inputs:
Split: かたむく;かたぶく
Split: そば
No Split: くすり
Split: おととい
It doesn't quite match what you show, but I think it's correct (and what you show isn't)
Expanding on my comment, "IndexOf can be used to find the first index of the [ character, and Substring can return the string up to that point."
public static string GetSubstringToChar(string input, char delimeter = '[')
{
if (input == null || !input.Contains(delimeter)) return input;
return input.Substring(0, input.IndexOf(delimeter));
}
To make this work with multiple delimeters, we can pass in an array of delimeter characters and use IndexOfAny:
public static string GetSubstringToChar(string input, char[] delimeters)
{
if (input == null || !input.Any(delimeters.Contains)) return input;
return input.Substring(0, input.IndexOfAny(delimeters));
}
You could then call this like:
var strings = new List<string>
{
"かたむく;かたぶく[ok]",
"そば[側,傍];そく[側];はた",
"くすり",
"おととい[gikun];おとつい[gikun];いっさくじつ",
};
var delimeters = new[] { ';', '[' };
foreach (var str in strings)
{
Console.WriteLine(GetSubstringToChar(str, delimeters));
}
An extension method with a little validation will do the job.
public static string GetUntil(this string input, char[] delimiters)
{
if (input == null || input.IndexOfAny(delimiters) == -1)
return input;
else
return input.Split(delimiters)[0];
}
then call like:
var test = "かたむく;かたぶく[ok]".GetUntil(new char[] { ' ', '[' });

C# string parse

I have string like this
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'"
Each pair of single quote is a field delimited by a comma. I want to empty the 8th field in the string. I cannot simply do replace("MUL,NBLD,NITA,NUND","") because that field could contain anything. also please note the the 4th field is a number and therefore has no single quote around 5.
How can I achieve this?
static void Main()
{
var temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'";
var parts = Split(temp).ToArray();
parts[7] = null;
var ret = string.Join(",", parts);
// or replace the above 3 lines with this...
//var ret = string.Join(",", Split(temp).Select((v,i)=>i!=7 ? v : null));
//ret == "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40',,'','Address line 2'"
}
public static IEnumerable<string> Split(string input, char delimiter = ',', char quote = '\'')
{
string temp = "";
bool skipDelimiter = false;
foreach (var c in input)
{
if (c == quote)
skipDelimiter = !skipDelimiter;
else if (c == delimiter && !skipDelimiter)
{
//do split
yield return temp;
temp = "";
continue;
}
temp += c;
}
yield return temp;
}
I made a small implementation below. I explain the logic in the comments. Basically you want to write a simple parser to accomplish what you described.
edit0: just realized I did the opposite of what you asked for oops..fixed now
edit1: replacing the string with null as opposed to eliminating the entire field from the comma-delimited list.
static void Main(string[] args)
{
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'";
//keep track of the single quotes
int singleQuoteCount= 0;
//keep track of commas
int comma_count = 0;
String field = "";
foreach (Char chr in temp)
{
//add to the field string if we are not between the 7th and 8th comma not counting commas between single quotes
if (comma_count != 7)
field += chr;
//plug in null string between two single quotes instead of whatever chars are in the eigth field.
else if (chr == '\'' && singleQuoteCount %2 ==1)
field += "\'',";
if (chr == '\'') singleQuoteCount++;
//only want to add to comma_count if we are outside of single quotes.
if (singleQuoteCount % 2 == 0 && chr == ',') comma_count++;
}
}
If you would use '-' (or other char) instead of ',' inside of the fields (exam: 'MUL-NBLD-NITA-NUND'), you could use this code:
static void Main(string[] args)
{
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL-NBLD-NITA-NUND','','Address line 2'";
temp = replaceField(temp, 8);
}
static string replaceField(string list, int field)
{
string[] fields = list.Split(',');
string chosenField = fields[field - 1 /*<--Arrays start at 0!*/];
if(!(field == fields.Length))
list = list.Replace(chosenField + ",", "");
else
list = list.Replace("," + chosenField, "");
return list;
}
//Return-Value: "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','','Address line 2'"

how to split comma with double quotes in c#?

string strExample =
"\"10553210\",\"na\",\"398,633,000\",\"20130709\",\"20130502\",\"20120724\",";
how to split above string with ","
I need an answer like
string[] arrExample = YourFunc(strExample);
arrExample[0] == "10553210";
arrExample[1] == "na";
arrExample[2] == "398,633,000";
...
with split option.
thanks in advance
Here is an easy way,
using Microsoft.VisualBasic.FileIO;
IList<string> arrExample;
using(var csvParser = new TextFieldParser(new StringReader(strExample))
{
fields = csvParser.ReadFields();
}
You may split not by comma "," but by whole string "\",\"".
Do not forget to Trim leading and trailing quotations ":
String strExample =
"\"10553210\",\"na\",\"398,633,000\",\"20130709\",\"20130502\",\"20120724\"";
string[] arrExample = St.Trim('"').Split(new String[] {"\",\""}, StringSplitOptions.None);
You can split on "," , The first and last entry you have to clean the " in the last and first entry:
string[] arr = strExample .Split(new string[] { "\",\"" },
StringSplitOptions.None);
//remove the extra quotes from the last and the first entry
arr[0] = arr[0].SubString(1,arr[0].Length - 1);
int last = arr.Length - 1;
arr[last] = arr[last].SubString(0,arr[last].Length - 1);
string[] arrExample = strExample.Split(",");
would do it, but your code won't compile. I assume you meant:
string strExample = "10553210,na,398,633,000,20130709,20130502,20120724";
If this isn't what you meant, please correct the question.
Assuming you meant this:
string strExample = "\"10553210\",\"na\",\"398,633,000\",\"20130709\",\"20130502\",\"20120724\"";
Split then Select the substring:
string[] parts = strExample.Split(',').Select(x => x.Substring(1, x.Length - 2)).ToArray();
Result:
strExample.Split(',');
You need to escape the double quotes if they're meant to be contained in your example string.
Using the example from Jodrell
private string[] SplitFields(string csvValue)
{
//if there aren't quotes, use the faster function
if (!csvValue.Contains('\"') && !csvValue.Contains('\''))
{
return csvValue.Trim(',').Split(',');
}
else
{
//there are quotes, use this built in text parser
using(var csvParser = new Microsoft.VisualBasic.FileIO.TextFieldParser(new StringReader(csvValue.Trim(','))))
{
csvParser.Delimiters = new string[] { "," };
csvParser.HasFieldsEnclosedInQuotes = true;
return csvParser.ReadFields();
}
}
}
This worked for me
public static IEnumerable<string> SplitCSV(string strInput)
{
string[] str = strInput.Split(',');
if (str == null)
yield return null;
StringBuilder quoteS = null;
foreach (string s in str)
{
if (s.StartsWith("\""))
{
if (s.EndsWith("\""))
{
yield return s;
}
quoteS = new StringBuilder(s);
continue;
}
if (quoteS != null)
{
quoteS.Append($",{s}");
if (s.EndsWith("\""))
{
string s1 = quoteS.ToString();
quoteS = null;
yield return s1;
}
else
continue;
}
yield return s;
}
}
static void Main(string[] args)
{
string s = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
Console.WriteLine(s);
var sp = SplitCSV(s);
foreach (string s1 in sp)
{
Console.WriteLine(s1);
}
Console.ReadKey();
}
you can do that by doing this ..
string stringname= "10553210,na,398,633,000,20130709,20130502,20120724";
List<String> asd = stringname.Split(',');
or if you wanr array then
array[] asd = stringname.Split(',').ToArray;

Splitting a string array

I have a string array string[] arr, which contains values like N36102W114383, N36102W114382 etc...
I want to split the each and every string such that the value comes like this N36082 and W115080.
What is the best way to do this?
This should work for you.
Regex regexObj = new Regex(#"\w\d+"); # matches a character followed by a sequence of digits
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
matchResults = matchResults.NextMatch(); #two mathches N36102 and W114383
}
If you have the fixed format every time you can just do this:
string[] split_data = data_string.Insert(data_string.IndexOf("W"), ",")
.Split(",", StringSplitOptions.None);
Here you insert a recognizable delimiter into your string and then split it by this delimiter.
Forgive me if this doesn't quite compile, but I'd just break down and write the string processing function by hand:
public static IEnumerable<string> Split(string str)
{
char [] chars = str.ToCharArray();
int last = 0;
for(int i = 1; i < chars.Length; i++) {
if(char.IsLetter(chars[i])) {
yield return new string(chars, last, i - last);
last = i;
}
}
yield return new string(chars, last, chars.Length - last);
}
If you use C#, please try:
String[] code = new Regex("(?:([A-Z][0-9]+))").Split(text).Where(e => e.Length > 0 && e != ",").ToArray();
in case you're only looking for the format NxxxxxWxxxxx, this will do just fine :
Regex r = new Regex(#"(N[0-9]+)(W[0-9]+)");
Match mc = r.Match(arr[i]);
string N = mc.Groups[1];
string W = mc.Groups[2];
Using the 'Split' and 'IsLetter' string functions, this is relatively easy in c#.
Don't forget to write unit tests - the following may have some corner case errors!
// input has form "N36102W114383, N36102W114382"
// output: "N36102", "W114383", "N36102", "W114382", ...
string[] ParseSequenceString(string input)
{
string[] inputStrings = string.Split(',');
List<string> outputStrings = new List<string>();
foreach (string value in inputstrings) {
List<string> valuesInString = ParseValuesInString(value);
outputStrings.Add(valuesInString);
}
return outputStrings.ToArray();
}
// input has form "N36102W114383"
// output: "N36102", "W114383"
List<string> ParseValuesInString(string inputString)
{
List<string> outputValues = new List<string>();
string currentValue = string.Empty;
foreach (char c in inputString)
{
if (char.IsLetter(c))
{
if (currentValue .Length == 0)
{
currentValue += c;
} else
{
outputValues.Add(currentValue);
currentValue = string.Empty;
}
}
currentValue += c;
}
outputValues.Add(currentValue);
return outputValues;
}

String formatting using C#

Is there a way to remove every special character from a string like:
"\r\n 1802 S St Nw<br>\r\n Washington, DC 20009"
And to just write it like:
"1802 S St Nw, Washington, DC 20009"
To remove special characters:
public static string ClearSpecialChars(this string input)
{
foreach (var ch in new[] { "\r", "\n", "<br>", etc })
{
input = input.Replace(ch, String.Empty);
}
return input;
}
To replace all double space with single space:
public static string ClearDoubleSpaces(this string input)
{
while (input.Contains(" ")) // double
{
input = input.Replace(" ", " "); // with single
}
return input;
}
You also may split both methods into a single one:
public static string Clear(this string input)
{
return input
.ClearSpecialChars()
.ClearDoubleSpaces()
.Trim();
}
two ways, you can use RegEx, or you can use String.Replace(...)
Use the Regex.Replace() method, specifying all of the characters you want to remove as the pattern to match.
You can use the C# Trim() method, look here:
http://msdn.microsoft.com/de-de/library/d4tt83f9%28VS.80%29.aspx
System.Text.RegularExpressions.Regex.Replace("\"\\r\\n 1802 S St Nw<br>\\r\\n Washington, DC 20009\"",
#"(<br>)*?\\r\\n\s+", "");
Maybe something like this, using ASCII int values. Assumes all html tags will be closed.
public static class StringExtensions
{
public static string Clean(this string str)
{
string[] split = str.Split(' ');
List<string> strings = new List<string>();
foreach (string splitStr in split)
{
if (splitStr.Length > 0)
{
StringBuilder sb = new StringBuilder();
bool tagOpened = false;
foreach (char c in splitStr)
{
int iC = (int)c;
if (iC > 32)
{
if (iC == 60)
tagOpened = true;
if (!tagOpened)
sb.Append(c);
if (iC == 62)
tagOpened = false;
}
}
string result = sb.ToString();
if (result.Length > 0)
strings.Add(result);
}
}
return string.Join(" ", strings.ToArray());
}
}

Categories