c# Using Substring and Regular Expressions to edit a large string - c#

The string, when displayed looks like: value1, value2, value3, value4, value5 etc..
What I want the string to do once I display it is (removing spaces and commas, i assume I can use index + 2 or something to get past the comma):
value1
value2
etc...
lastKnownIndexPos = 0;
foreach (System.Text.RegularExpressions.Match m in System.Text.RegularExpressions.Regex.Matches(unformatedList, ",?")) //Currently is ',' can I use ', '?
{
list += unformatedList.Substring(lastKnownIndexPos, m.Index - 1) + "\n\n"; //-1 to grab just the first value.
lastIndex = m.Index + 2; //to skip the comma and the space to get to the first letter of the next word.
//lastIndex++; //used this to count how many times it was found, maxed at 17 (have over 100):(
}
//MessageBox.Show(Convert.ToString(lastIndex)); //used to display total times each was found.
MessageBox.Show(list);
At the moment the message box does not show any text, but using the lastIndex I get a value of 17 so I know it works for part of it :P

That's easy (I'm using System.Linq here):
var formatted = string.Join("\n\n", unformatedList.Split(',').Select(x => x.Trim()));
MessageBox.Show(formatted);
An alternative approach, as swannee pointed out, would be the following:
var formatted = Regex.Replace(unformatedList, #"\s*,\s*", "\n\n").Trim();
Edit:
To make the above examples work regardless of how you use the result string, you should use Environment.NewLine instead of "\n".

One way is to simply replace the ", " with a newline.
MessageBox.Show( unformatedList.Replace(", ", "\n") );

Or heck, why not just use string.Replace?
var formatted = unformattedList.Replace(", ", "\n\n");

Related

How do I Split string only at last occurrence of special character and use both sides after split

I want to split a string only at last occurrence of special character.
I try to parse a name of a tab from browser, so my initial string looks for example like this:
Untitled - Google Chrome
That is easy to solve as there is a Split function. Here is my implementation:
var pageparts= Regex.Split(inputWindow.ToString(), " - ");
InsertWindowName(pageparts[0].ToString(), pageparts[1].ToString());//method to save string into separate columns in DB
This works, but problem occurs, when I get a page like this:
SQL injection - Wikipedia, the free encyclopedia - Mozilla Firefox
Here are two dashes, which means, that after split is done, there are 3 separate strings in array and if I would continue normally, database would contain in first column value "SQL injection" and in second column value "Wikipedia, the free encyclopedia". Last value will be completely left out.
What I want is that first column in database will have value:
SQL injection - Wikipedia, the free encyclopedia" and second column will have:
"Mozilla Firefox". Is that somehow possible?
I tried to use a Split(" - ").Last() function (even LastOrDefault() too), but then I only got a last string. I need to get both side of the original string. Just separated by last dash.
You can use String.Substring with String.LastIndexOf:
string str = "SQL injection - Wikipedia, the free encyclopedia - Mozilla Firefox";
int lastIndex = str.LastIndexOf('-');
if (lastIndex + 1 < str.Length)
{
string firstPart = str.Substring(0, lastIndex);
string secondPart = str.Substring(lastIndex + 1);
}
Create a extension method (or a simple method) to perform that operation and also add some error checking for lastIndex.
EDIT:
If you want to split on " - " (space-space) then use following to calculate lastIndex
string str = "FirstPart - Mozzila Firefox-somethingWithoutSpace";
string delimiter = " - ";
int lastIndex = str.LastIndexOf(delimiter);
if (lastIndex + delimiter.Length < str.Length)
{
string firstPart = str.Substring(0, lastIndex);
string secondPart = str.Substring(lastIndex + delimiter.Length);
}
So for string like:
"FirstPart - Mozzila Firefox-somethingWithoutSpace"
Output would be:
FirstPart
Mozzila Firefox-somethingWithoutSpace
Please forgive me for my laziness ins this solution i'm sure there is a better approach but i will give you one solution proposal i'm assuming you are codding in C#.
First of all correct me if I get wrongly the question no matter what you just want to columns returned the first (all text even of it includes dashes but the last one) and last column (all the text after last dash) if it's ok. let's do it.
// I Only use split function when I want all data in separate variable (array position) in you case I assumed that you just want 2 values (if possible), so you can use substring.
static void Main(string[] args)
{
string firstname = "";
string lastName = "";
string variablewithdata = "SQL injection - Wikipedia, -the free encyclopedia - Mozilla Firefox";
// variablewithdata.LastIndexOf('-') = returns Integer corresponding to the last position of that character.
//I suggest you validate if variablewithdata.LastIndexOf('-') is equal to -1 or not because if it don't found your character it returns -1 so if the value isn't -1 you can substring
firstname = variablewithdata.Substring(0, (variablewithdata.LastIndexOf('-') - 1));
lastName = variablewithdata.Substring(variablewithdata.LastIndexOf('-') + 1);
Console.WriteLine("FirstColumn: {0} \nLastColumn:{1}",firstname,lastName);
Console.ReadLine();
}
If it's not what you want can you explain me for example for "SQL injection - Wikipedia,- the free - encyclopedia - Mozilla Firefox" what's suppose to be returned?
Forgive me for unclean code i'm bored today.
If you don't care about reassembling strings, you could use something like :
var pageparts= Regex.Split(inputWindow.ToString(), " - ");
var firstPart = string.Join(" - ", pageparts.Take(pageparts.Length - 1));
var secondPart = pageparts.Last()
InsertWindowName(firstPart, secondPart);

Regex.Split and string.Split not working as expected

I am attempting to split strings using '?' as the delimiter. My code reads data from a CSV file, and certain symbols (like fractions) are not recognized by C#, so I am trying to replace them with a relevant piece of data (bond coupon in this case). I have print statements in the following code (which is embedded in a loop with index variable i) to test the output:
string[] l = lines[i][1].Split('?');
//string[] l = Regex.Split(lines[i][1], #"\?");
System.Console.WriteLine("L IS " + l.Length.ToString() + " LONG");
for (int j = 0; j < l.Length; j++)
System.Console.WriteLine("L["+ j.ToString() + "] IS " + l[j]);
if (l.Length > 1)
{
double cpn = Convert.ToDouble(lines[i][12]);
string couponFrac = (cpn - Math.Floor(cpn)).ToString().Remove(0,1);
lines[i][1] = l[0].Remove(l[0].Length-1) + couponFrac + l[1]; // Recombine, replacing '?' with CPN
}
The issue is that both split methods (string.Split() and Regex.Split() ) produce inconsistent results with some of the string elements in lines splitting correctly and the others not splitting at all (and thus the question mark is still in the string).
Any thoughts? I've looked at similar posts on split methods and they haven't been too helpful.
I had no problem using String.Split. Could you post your input and output?
If at all you could probably use String.Replace to replace your desired '?' with a character that does not occur in the string and then use String.Split on that character to split the resultant string for the same effect. (just a try)
I didn't have any trouble parsing the following.
var qsv = "now?is?the?time";
var keywords = qsv.Split('?');
keywords.Dump();
screenshot of code and output...
UPDATE:
There doesn't appear to be any problem with Split. There is a problem somewhere else because in this small scale test it works just fine. I would suggest you use LinqPad to test out these kinds of scenarios small scale.
var qsv = "TII 0 ? 04/15/15";
var keywords = qsv.Split('?');
keywords.Dump();
qsv = "TII 0 ? 01/15/22";
keywords = qsv.Split('?');
keywords.Dump();
New updated output:

How to extract range of characters from a string

If I have a string such as the following:
String myString = "SET(someRandomName, \"hi\", u)";
where I know that "SET(" will always exists in the string, but the length of "someRandomName" is unknown, how would I go about deleting all the characters from "(" to the first instance of """? So to re-iterate, I would like to delete this substring: "SET(someRandomName, \"" from myString.
How would I do this in C#.Net?
EDIT: I don't want to use regex for this.
Providing the string will always have this structure, the easiest is to use String.IndexOf() to look-up the index of the first occurence of ". String.Substring() then gives you appropriate portion of the original string.
Likewise you can use String.LastIndexOf() to find the index of the first " from the end of the string. Then you will be able to extract just the value of the second argument ("hi" in your sample).
You will end up with something like this:
int begin = myString.IndexOf('"');
int end = myString.LastIndexOf('"');
string secondArg = myString.Substring(begin, end - begin + 1);
This will yield "\"hi\"" in secondArg.
UPDATE: To remove a portion of the string, use the String.Remove() method:
int begin = myString.IndexOf('(');
int end = myString.IndexOf('"');
string altered = myString.Remove(begin + 1, end - begin - 1);
This will yield "SET(\"hi\", u)" in altered.
I know it's been years, but .Net been has also evolved in the meantime.
Consider using range operator in case anyone looking here for an answer.
Assuming that Set( and \"hi\", u) is constant value (8 digit without the escapes):
var sub = myString[^4...^8];
myString.Replace(sub, replaceValue);
more examples and a good explanation in this article or of course in microsoft docs
This is pretty awful, but this will accomplish what you want with a simple linq statement. Just presenting as an alternative to the IndexOf answers.
string myString = "SET(someRandomName, \"hi\", 0)";
string fixedStr = new String( myString.ToCharArray().Take( 4 ).Concat( myString.ToCharArray().SkipWhile( c => c != '"' ) ).ToArray() );
yields: SET("hi", 0)
Note: the skip is hard-coded for 4 characters, you could alter it to skip over the characters in an array that contains them instead.
I assume you want to transform
SET(someRandomName, "hi", u)
into:
SET(u)
To achieve that, you can use:
String newString = "SET(" + myString.Substring(myString.LastIndexOf(',') + 1).Trim();
To explain this bit by bit:
myString.LastIndexOf(',')
will give you the index (position) of your last , character. Increment it by 1 to get the start index of the third argument in your SET function.
myString.Substring(myString.LastIndexOf(',') + 1)
The Substring method will eliminate all characters up to the specified position. In this case, we’re eliminating everything up to (and including) the last ,. In the example above, this would eliminate the SET(someRandomName, "hi", part, and leave us with u).
The Trim is necessary simply to remove the leading space character before your u.
Finally, we prepend SET( to our substring (since we had formerly removed it due to our Substring).
Edit: Based on your comment below (which contradicts what you asked in your question), you can use:
String newString = "SET(" + myString.Substring(myString.IndexOf(',') + 1).Trim();

C# how to split a string backwards?

What i'm trying to do is split a string backwards. Meaning right to left.
string startingString = "<span class=\"address\">Hoopeston,, IL 60942</span><br>"
What I would do normally is this.
string[] splitStarting = startingString.Split('>');
so my splitStarting[1] would = "Hoopeston,, IL 60942</span"
then I would do
string[] splitAgain = splitStarting[1].Split('<');
so splitAgain[0] would = "Hoopeston,, IL 60942"
Now this is what I want to do, I want to split by ' ' (a space) reversed for the last 2 instances of ' '.
For example my array would come back like so:
[0]="60942"
[1]="IL"
[2] = "Hoopeston,,"
To make this even harder I only ever want the first two reverse splits, so normally I would do something like this
string[] splitCity,Zip = splitAgain[0].Split(new char[] { ' ' }, 3);
but how would you do that backwards? The reason for that is, is because it could be a two name city so an extra ' ' would break the city name.
Regular expression with named groups to make things so much simpler. No need to reverse strings. Just pluck out what you want.
var pattern = #">(?<city>.*) (?<state>.*) (?<zip>.*?)<";
var expression = new Regex(pattern);
Match m = expression .Match(startingString);
if(m.success){
Console.WriteLine("Zip: " + m.Groups["zip"].Value);
Console.WriteLine("State: " + m.Groups["state"].Value);
Console.WriteLine("City: " + m.Groups["city"].Value);
}
Should give the following results:
Found 1 match:
1. >Las Vegas,, IL 60942< has 3 groups:
1. Las Vegas,, (city)
2. IL (state)
3. 60942 (zip)
String literals for use in programs:
C#
#">(?<city>.*) (?<state>.*) (?<zip>.*?)<"
One possible solution - not optimal but easy to code - is to reverse the string, then to split that string using the "normal" function, then to reverse each of the individual split parts.
Another possible solution is to use regular expressions instead.
I think you should do it like this:
var s = splitAgain[0];
var zipCodeStart = s.LastIndexOf(' ');
var zipCode = s.Substring(zipCodeStart + 1);
s = s.Substring(0, zipCodeStart);
var stateStart = s.LastIndexOf(' ');
var state = s.Substring(stateStart + 1);
var city = s.Substring(0, stateStart );
var result = new [] {zipCode, state, city};
Result will contain what you requested.
If Split could do everything there would be so many overloads that it would become confusing.
Don't use split, just custom code it with substrings and lastIndexOf.
string str = "Hoopeston,, IL 60942";
string[] parts = new string[3];
int place = str.LastIndexOf(' ');
parts[0] = str.Substring(place+1);
int place2 = str.LastIndexOf(' ',place-1);
parts[1] = str.Substring(place2 + 1, place - place2 -1);
parts[2] = str.Substring(0, place2);
You can use a regular expression to get the three parts of the string inside the tag, and use LINQ extensions to get the strings in the right order.
Example:
string startingString = "<span class=\"address\">East St Louis,, IL 60942</span><br>";
string[] city =
Regex.Match(startingString, #"^.+>(.+) (\S+) (\S+?)<.+$")
.Groups.Cast<Group>().Skip(1)
.Select(g => g.Value)
.Reverse().ToArray();
Console.WriteLine(city[0]);
Console.WriteLine(city[1]);
Console.WriteLine(city[2]);
Output:
60942
IL
East St Louis,,
How about
using System.Linq
...
splitAgain[0].Split(' ').Reverse().ToArray()
-edit-
ok missed the last part about multi word cites, you can still use linq though:
splitAgain[0].Split(' ').Reverse().Take(2).ToArray()
would get you the
[0]="60942"
[1]="IL"
The city would not be included here though, you could still do the whole thing in one statement but it would be a little messy:
var elements = splitAgain[0].Split(' ');
var result = elements
.Reverse()
.Take(2)
.Concat( new[ ] { String.Join( " " , elements.Take( elements.Length - 2 ).ToArray( ) ) } )
.ToArray();
So we're
Splitting the string,
Reversing it,
Taking the two first elements (the last two originally)
Then we make a new array with a single string element, and make that string from the original array of elements minus the last 2 elements (Zip and postal code)
As i said, a litle messy, but it will get you the array you want. if you dont need it to be an array of that format you could obviously simplfy the above code a little bit.
you could also do:
var result = new[ ]{
elements[elements.Length - 1], //last element
elements[elements.Length - 2], //second to last
String.Join( " " , elements.Take( elements.Length - 2 ).ToArray( ) ) //rebuild original string - 2 last elements
};
At first I thought you should use Array.Reverse() method, but I see now that it is the splitting on the ' ' (space) that is the issue.
Your first value could have a space in it (ie "New York"), so you dont want to split on spaces.
If you know the string is only ever going to have 3 values in it, then you could use String.LastIndexOf(" ") and then use String.SubString() to trim that off and then do the same again to find the middle value and then you will be left with the first value, with or without spaces.
Was facing similar issue with audio FileName conventions.
Followed this way: String to Array conversion, reverse and split, and reverse each part back to normal.
char[] addressInCharArray = fullAddress.ToCharArray();
Array.Reverse(addressInCharArray);
string[] parts = (new string(addressInCharArray)).Split(new char[] { ' ' }, 3);
string[] subAddress = new string[parts.Length];
int j = 0;
foreach (string part in parts)
{
addressInCharArray = part.ToCharArray();
Array.Reverse(addressInCharArray);
subAddress[j++] = new string(addressInCharArray);
}

how to place - in a string

I have a string "8329874566".
I want to place - in the string like this "832-98-4566"
Which string function can I use?
I would have done something like this..
string value = "8329874566";
value = value.Insert(6, "-").Insert(3, "-");
You convert it to a number and then format the string.
What I like most about this is it's easier to read/understand what's going on then using a few substring methods.
string str = "832984566";
string val = long.Parse(str).ToString("###-##-####");
There may be a tricky-almost-unreadable regex solution, but this one is pretty readable, and easy.
The first parameter of the .Substring() method is where you start getting the characters, and the second is the number of characters you want to get, and not giving it sets a default as value.length -1 (get chars until the end of the string):
String value = "8329874566";
String Result = value.Substring(0,3) + "-" + value.Substring(3,2) + "-" + value.Substring(6);
--[edit]--
Just noticed you didn't use one of the numbers AT ALL (number '7') in the expected result example you gave, but if you want it, just change the last substring as "5", and if you want the '7' but don't want 5 numbers in the last set, let it like "5,4".
Are you trying to do this like American Social Security numbers? I.e., with a hyphen after the third and and fifth numerals? If so:
string s = "8329874566";
string t = String.Format("{0}-{1}-{2}", s.Substring(0, 3), s.Substring(3, 2), s.Substring(5));
Just out of completeness, a regular expression variant:
Regex.Replace(s, #"(\d{3})(\d{2})(\d{4})", "$1-$2-$3");
I consider the Insert variant to be the cleanest, though.
This works fine, and I think that is more clear:
String value = "8329874566";
value = value.Insert(3, "-").Insert(6, "-");
The console outputs shows this:
832-98-74566
If the hyphens are to go in the same place each time, then you could simply concatenate together the pieces of the orginal string like this:
// 0123456789 <- index
string number = "8329874566";
string new = number.Substring(0, 3) + "-" + number.Substring(3, 2) + "-" + number.Substring(5);
For a general way of making mutable strings, use the StringBuilder class. This allows deletions and insertions to be made before calling ToString to produce the final string.
You could try the following:
string strNumber = "8329874566"
string strNewNumber = strNumber.Substring(0,3) + "-" + strNumber.Substring(4,2) + "-" strNumber.Substring(6)
or something in this manner
string val = "832984566";
string result = String.Format("{0}-{1}-{2}", val.Substring(0,3), val.Substring(3,2), val.Substring(5,4));
var result = string.Concat(value.Substring(0,3), "-", value.Substring(3,2), "-", value.Substring(5,4));
or
var value = "8329874566".Insert(3, "-").Insert(6, "-");
Now how about this for a general solution?
// uglified code to fit within horizontal limits
public static string InsertAtIndices
(this string original, string insertion, params int[] insertionPoints) {
var mutable = new StringBuilder(original);
var validInsertionPoints = insertionPoints
.Distinct()
.Where(i => i >= 0 && i < original.Length)
.OrderByDescending(i => i);
foreach (int insertionPoint in validInsertionPoints)
mutable.Insert(insertionPoint, insertion);
return mutable.ToString();
}
Usage:
string ssn = "832984566".InsertAtIndices("-", 3, 5);
string crazy = "42387542342309856340924803"
.InsertAtIndices(":", 1, 2, 3, 4, 5, 6, 17, 200, -1, -1, 2, 3, 3, 4);
Console.WriteLine(ssn);
Console.WriteLine(crazy);
Output:
832-98-4566
4:2:3:8:7:5:42342309856:340924803
Overkill? Yeah, maybe...
P.S. Yes, I am regex illiterate--something I hope to rectify someday.
A straightforward (but not flexible) approach would be looping over the characters of the string while keeping a counter running. You can then construct a new string character by character. You can add the '-' character after the 3rd and 5th character.
A better approach may be to use a function to insert a single character in the middle of the string at a specific index. String.Insert() would do well. The only thing to pay attention to here is that the string indexes will get off by one with each insert.
EDIT more language-specific as per comments

Categories