Detecting newline in string and adding a character before it

Detecting newline in string and adding a character before it - c#

Hi I have the folowing string:
* lalalalalaal
* 12121212121212
* 36363636363636
* 21454545454545454
every line of the list start with - "\r\n* "
is there a way to detect the "\r\n* " symbol at the beginning and maybe replace it with numbers 1, 2, 3, ...n. So in example something like this:
1. lalalalalaal
2. 12121212121212
3. 36363636363636
4. 21454545454545454
I imagine building an array and running the for loop would be required but i do not get my head around where I am supposed to start.

If I understand you correctly, you have a string that looks like this:
"\r\n* lalalalalaal\r\n* 12121212121212\r\n* 36363636363636\r\n* 21454545454545454"
And you want to replace "\r\n*" with "\r\n1.", where the number 1 increments each time the search string is found.
If so, here's one way to do it: Use the IndexOf method to find the location of the string you're searching for, keep a counter variable that increments every time you find the search term, and then use Substring to get the sub strings before and after the part to replace ('*'), and then put the counter's value between them:
static string ReplaceWithIncrementingNumber(string input, string find, string partToReplace)
{
if (input == null || find == null ||
partToReplace == null || !find.Contains(partToReplace))
{
return input;
}
// Get the index of the first occurrence of our 'find' string
var index = input.IndexOf(find);
// Track the number of occurrences we've found, to use as a replacement string
var counter = 1;
while (index > -1)
{
// Get the leading string up to '*', add the counter, then add the trailing string
input = input.Substring(0, index) +
find.Replace(partToReplace, $"{counter++}.") +
input.Substring(index + find.Length);
// Find the next occurrence of our 'find' string
index = input.IndexOf(find, index + find.Length);
}
return input;
}
Here's a sample using your input string:
static void Main()
{
var input = "\r\n* lalalalalaal\r\n* 12121212121212\r\n* " +
"36363636363636\r\n* 21454545454545454";
Console.WriteLine(ReplaceWithIncrementingNumber(input, "\r\n*", "*"));
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output

You can use Linq and String.Replace to achieve it.(Believe you already have your strings as List as mentioned in second part of your OP)
var result = list.Select((x,index)=> $"{index+1}.{x.Replace("\r\n* ",string.Empty)}");
In case, you do not have it as list, then you can split your string as
var result = str.Split(new string[]{Environment.NewLine},StringSplitOptions.RemoveEmptyEntries)
.Select((x,index)=> $"{index+1}.{x.Replace("* ",string.Empty)}");

Related

PigLatin how can I strip punctuation from a string? And Then add it back?

Working on program for class call pig Latin. It works for what I need for class. It ask just to type in a phase to convert. But I notice if I type a sentence with punctuation at the end it will mess up the last word translation. Trying to figure out the best way to fix this. New at programming but I would need away for it to check last character in word to check for punctuations. Remove it before translation and then add it back. Not sure how to do that. Been reading about char.IsPunctuation. Plus not sure what part of my code I would had for that check.
public static string MakePigLatin(string str)
{
string[] words = str.Split(' ');
str = String.Empty;
for (int i = 0; i < words.Length; i++)
{
if (words[i].Length <= 1) continue;
string pigTrans = new String(words[i].ToCharArray());
pigTrans = pigTrans.Substring(1, pigTrans.Length - 1) + pigTrans.Substring(0, 1) + "ay ";
str += pigTrans;
}
return str.Trim();
}

The following should get you strings of letters for converting while passing through any non-letter characters that follow them.
Splitter based on Splitting a string in C#
public static string MakePigLatin(string str) {
MatchCollection matches = Regex.Matches(str, #"([a-zA-Z]*)([^a-zA-Z]*)");
StringBuilder result = new StringBuilder(str.Length * 2);
for (int i = 0; i < matches.Count; ++i) {
string pigTrans = matches[i].Groups[1].Captures[0].Value ?? string.Empty;
if (pigTrans.Length > 1) {
pigTrans = pigTrans.Substring(1) + pigTrans.Substring(0, 1) + "ay";
}
result.Append(pigTrans).Append(matches[i].Groups[2].Captures[0].Value);
}
return result.ToString();
}
The matches variable should contain all the match collections of 2 groups. The first group will be 0 or more letters to translate followed by a second group of 0 or more non-letters to pass through. The StringBuilder should be more memory efficient than concatenating System.String values. I gave it a starting allocation of double the initial string size just to avoid having to double the allocated space. If memory is tight, maybe 1.25 or 1.5 instead of 2 would be better, but you'd probably have to convert it back to int after. I took the length calculation off your Substring call because leaving it out grabs everything to the end of the string already.

Extracting substring based on the same identifier in two locations

I found this question, which achieves what I am looking for, however I only have one problem: the "start" and "end" of the substring are the same character.
My string is:
.0.label unicode "Area - 110"
and I want to extract the text between the inverted commas ("Area - 110").
In the linked question, the answers are all using specific identifiers, and IndexOf solutions. The problem is that if I do the same, IndexOf will likely return the same value.
Additionally, if I use Split methods, the text I want to keep is not a fixed length - it could be one word, it could be seven; so I am also having issues specifying the indexes of the first and last word in that collection as well.

The problem is that if I do the same, IndexOf will likely return the same value.
A common trick in this situation is to use LastIndexOf to find the location of the closing double-quote:
int start = str.IndexOf('"');
int end = str.LastIndexOf('"');
if (start >= 0 && end > start) {
// We have two separate locations
Console.WriteLine(str.Substring(start+1, end-start-1));
}
Demo.

I would to it like this:
string str = ".0.label unicode \"Area - 110\"";
str = input.SubString(input.IndexOf("\"") + 1);
str = input.SubString(0, input.IndexOf("\""));
In fact, this is one of my most used helper methods/extensions, because it is quite versatile:
/// <summary>
/// Isolates the text in between the parameters, exclusively, using invariant, case-sensitive comparison.
/// Both parameters may be null to skip either step. If specified but not found, a FormatException is thrown.
/// </summary>
public static string Isolate(this string str, string entryString, string exitString)
{
if (!string.IsNullOrEmpty(entryString))
{
int entry = str.IndexOf(entryString, StringComparison.InvariantCulture);
if (entry == -1) throw new FormatException($"String.Isolate failed: \"{entryString}\" not found in string \"{str.Truncate(80)}\".");
str = str.Substring(entry + entryString.Length);
}
if (!string.IsNullOrEmpty(exitString))
{
int exit = str.IndexOf(exitString, StringComparison.InvariantCulture);
if (exit == -1) throw new FormatException($"String.Isolate failed: \"{exitString}\" not found in string \"{str.Truncate(80)}\".");
str = str.Substring(0, exit);
}
return str;
}
You'd use that like this:
string str = ".0.label unicode \"Area - 110\"";
string output = str.Isolate("\"", "\"");

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

I have this code:
string firstTag = "Forums2008/forumPage.aspx?forumId=";
string endTag = "</a>";
index = forums.IndexOf(firstTag, index1);
if (index == -1)
continue;
var secondIndex = forums.IndexOf(endTag, index);
result = forums.Substring(index + firstTag.Length + 12, secondIndex - (index + firstTag.Length - 50));
The string i want to extract from is for example:
הנקה
What i want to get is the word after the title only this: הנקה
And the second problem is that when i'm extracting it i see instead hebrew some gibrish like this: ������

One powerful way to do this is to use Regular Expressions instead of trying to find a starting position and use a substring. Try out this code, and you'll see that it extracts the anchor tag's title:
var input = "הנקה";
var expression = new System.Text.RegularExpressions.Regex(#"title=\""([^\""]+)\""");
var match = expression.Match(input);
if (match.Success) {
Console.WriteLine(match.Groups[1]);
}
else {
Console.WriteLine("not found");
}
And for the curious, here is a version in JavaScript:
var input = 'הנקה';
var expression = new RegExp('title=\"([^\"]+)\"');
var results = expression.exec(input);
if (results) {
document.write(results[1]);
}
else {
document.write("not found");
}

Okay here is the solution using String.Substring() String.Split() and String.IndexOf()
String str = "הנקה"; // <== Assume this is passing string. Yes unusual scape sequence are added
int splitStart = str.IndexOf("title="); // < Where to start splitting
int splitEnd = str.LastIndexOf("</a>"); // < = Where to end
/* What we try to extract is this : title="הנקה">הנקה
* (Given without escape sequence)
*/
String extracted = str.Substring(splitStart, splitEnd - splitStart); // <=Extracting required portion
String[] splitted = extracted.Split('"'); // < = Now split with "
Console.WriteLine(splitted[1]); // <= Try to Out but yes will produce ???? But put a breakpoint here and check the values in split array
Now the problem, here you can see that i have to use escape sequence in an unusual way. You may ignore that since you are simply passing the scanning string.
And this actually works, but you cannot visualize it with the provided Console.WriteLine(splitted[1]);
But if you put a break point and check the extracted split array you can see that text are extracted. you can confirm it with following screenshot

Splitting data with inconsistent delimiters

I have these data files comming in on a server that i need to split into [date time] and [value]. Most of them are delimited a single time between time and value and between date and time is a space. I already have a program processing the data with a simple split(char[]) but now found data where the delimiter is a space and i am wondering how to tackle this best.
So most files i encountered look like this:
18-06-2014 12:00:00|220.6
The delimiters vary, but i tackled that with a char[]. But today i ran into a problem on this format:
18-06-2014 12:00:00 220.6
This complicates things a little. The easy solution would be to just add a space to my split characters and when i find 3 splits combine the first two before processing?
I'm looking for a 2nd opining on this matter. Also the time format can change to something like d/m/yy and the amount of lines can run into the millions so i would like to keep it as efficient as possible.

Yes I believe the most efficient solution is to add space as a delimiter and then just combine the first two if you get three. That is going to be be more efficient than regex.

You've got a string 18-06-2014 12:00:00 220.6 where first 19 characters is a date, one character is a separation symbol and other characters is a value. So:
var test = "18-06-2014 12:00:00|220.6";
var dateString = test.Remove(19);
var val = test.Substring(20);
Added normalization:
static void Main(string[] args) {
var test = "18-06-2014 12:00:00|220.6";
var test2 = "18-6-14 12:00:00|220.6";
var test3 = "8-06-14 12:00:00|220.6";
Console.WriteLine(test);
Console.WriteLine(TryNormalizeImportValue(test));
Console.WriteLine(test2);
Console.WriteLine(TryNormalizeImportValue(test2));
Console.WriteLine(test3);
Console.WriteLine(TryNormalizeImportValue(test3));
}
private static string TryNormalizeImportValue(string value) {
var valueSplittedByDateSeparator = value.Split('-');
if (valueSplittedByDateSeparator.Length < 3) throw new InvalidDataException();
var normalizedDay = NormalizeImportDayValue(valueSplittedByDateSeparator[0]);
var normalizedMonth = NormalizeImportMonthValue(valueSplittedByDateSeparator[1]);
var valueYearPartSplittedByDateTimeSeparator = valueSplittedByDateSeparator[2].Split(' ');
if (valueYearPartSplittedByDateTimeSeparator.Length < 2) throw new InvalidDataException();
var normalizedYear = NormalizeImportYearValue(valueYearPartSplittedByDateTimeSeparator[0]);
var valueTimeAndValuePart = valueYearPartSplittedByDateTimeSeparator[1];
return string.Concat(normalizedDay, '-', normalizedMonth, '-', normalizedYear, ' ', valueTimeAndValuePart);
}
private static string NormalizeImportDayValue(string value) {
return value.Length == 2 ? value : "0" + value;
}
private static string NormalizeImportMonthValue(string value) {
return value.Length == 2 ? value : "0" + value;
}
private static string NormalizeImportYearValue(string value) {
return value.Length == 4 ? value : DateTime.Now.Year.ToString(CultureInfo.InvariantCulture).Remove(2) + value;
}

Well you can use this one to get the date and the value.
(((0[1-9]|[12][0-9]|3[01])-(0[1-9]|1[012])-(19|20)\d\d)\s((\d{2}:?){3})|(\d+\.?\d+))
This will give you 2 matches
1º 18-06-2014 12:00:00
2º 220.6
Example:
http://regexr.com/391d3

This regex matches both kinds of strings, capturing the two tokens to Groups 1 and 2.
Note that we are not using \d because in .NET it can match any Unicode digits such as Thai...
The key is in the [ |] character class, which specifies your two allowable delimiters
Here is the regex:
^([0-9]{2}-[0-9]{2}-[0-9]{4} (?:[0-9]{2}:){2}[0-9]{2})[ |]([0-9]{3}\.[0-9])$
In the demo, please pay attention to the capture Groups in the right pane.
Here is how to retrieve the values:
var myRegex = new Regex(#"^([0-9]{2}-[0-9]{2}-[0-9]{4} (?:[0-9]{2}:){2}[0-9]{2})[ |]([0-9]{3}\.[0-9])$", RegexOptions.IgnoreCase);
string mydate = myRegex.Match(s1).Groups[1].Value;
Console.WriteLine(mydate);
string myvalue = myRegex.Match(s1).Groups[1].Value;
Console.WriteLine(myvalue);
Please let me know if you have questions

Given the provided format I'd use something like
char delimiter = ' '; //or whatever the delimiter for the specific file is, this can be set in a previous step
int index = line.LastIndexOf(delimiter);
var date = line.Remove(index);
var value = line.Substring(++index);
If there are that many lines and efficiency matters, you could obtain the delimiter once on the first line, by looping back from the end and find the first index that is not a digit or dot (or comma if the value can contain those) to determine the delimiter, and then use something such as the above.
If each line can contain a different delimiter, you could always track back to the first not value char as described above and still maintain adequate performance.
Edit: for completeness sake, to find the delimiter, you could perform the following once per file (provided that the delimiter stays consistent within the file)
char delimiter = '\0';
for (int i = line.Length - 1; i >= 0; i--)
{
var c= line[i];
if (!char.IsDigit(c) && c != '.')
{
delimiter = c;
break;
}
}

How to extract string at a certain character that is repeated within string?

How can I get "MyLibrary.Resources.Images.Properties" and "Condo.gif" from a "MyLibrary.Resources.Images.Properties.Condo.gif" string.
I also need it to be able to handle something like "MyLibrary.Resources.Images.Properties.legend.House.gif" and return "House.gif" and "MyLibrary.Resources.Images.Properties.legend".
IndexOf LastIndexOf wouldn't work because I need the second to last '.' character.
Thanks in advance!
UPDATE
Thanks for the answers so far but I really need it to be able to handle different namespaces. So really what I'm asking is how to I split on the second to last character in a string?

You can use LINQ to do something like this:
string target = "MyLibrary.Resources.Images.Properties.legend.House.gif";
var elements = target.Split('.');
const int NumberOfFileNameElements = 2;
string fileName = string.Join(
".",
elements.Skip(elements.Length - NumberOfFileNameElements));
string path = string.Join(
".",
elements.Take(elements.Length - NumberOfFileNameElements));
This assumes that the file name part only contains a single . character, so to get it you skip the number of remaining elements.

You can either use a Regex or String.Split with '.' as the separator and return the second-to-last + '.' + last pieces.

You can look for IndexOf("MyLibrary.Resources.Images.Properties."), add that to MyLibrary.Resources.Images.Properties.".Length and then .Substring(..) from that position

If you know exactly what you're looking for, and it's trailing, you could use string.endswith. Something like
if("MyLibrary.Resources.Images.Properties.Condo.gif".EndsWith("Condo.gif"))
If that's not the case check out regular expressions. Then you could do something like
if(Regex.IsMatch("Condo.gif"))
Or a more generic way: split the string on '.' then grab the last two items in the array.

string input = "MyLibrary.Resources.Images.Properties.legend.House.gif";
//if string isn't already validated, make sure there are at least two
//periods here or you'll error out later on.
int index = input.LastIndexOf('.', input.LastIndexOf('.') - 1);
string first = input.Substring(0, index);
string second = input.Substring(index + 1);

Try splitting the string into an array, by separating it by each '.' character.
You will then have something like:
{"MyLibrary", "Resources", "Images", "Properties", "legend", "House", "gif"}
You can then take the last two elements.

Just break down and do it in a char loop:
int NthLastIndexOf(string str, char ch, int n)
{
if (n <= 0) throw new ArgumentException();
for (int idx = str.Length - 1; idx >= 0; --idx)
if (str[idx] == ch && --n == 0)
return idx;
return -1;
}
This is less expensive than trying to coax it using string splitting methods and isn't a whole lot of code.
string s = "1.2.3.4.5";
int idx = NthLastIndexOf(s, '.', 3);
string a = s.Substring(0, idx); // "1.2"
string b = s.Substring(idx + 1); // "3.4.5"

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Detecting newline in string and adding a character before it - c#

Related

PigLatin how can I strip punctuation from a string? And Then add it back?

Extracting substring based on the same identifier in two locations

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

Splitting data with inconsistent delimiters

How to extract string at a certain character that is repeated within string?

Categories

Resources