How to remove a duplicate set of characters in a string

How to remove a duplicate set of characters in a string - c#

For example a string contains the following (the string is variable):
http://www.google.comhttp://www.google.com
What would be the most efficient way of removing the duplicate url here - e.g. output would be:
http://www.google.com

I assume that input contains only urls.
string input = "http://www.google.comhttp://www.google.com";
// this will get you distinct URLs but without "http://" at the beginning
IEnumerable<string> distinctAddresses = input
.Split(new[] {"http://"}, StringSplitOptions.RemoveEmptyEntries)
.Distinct();
StringBuilder output = new StringBuilder();
foreach (string distinctAddress in distinctAddresses)
{
// when building the output, insert "http://" before each address so
// that it resembles the original
output.Append("http://");
output.Append(distinctAddress);
}
Console.WriteLine(output);

Efficiency has various definitions: code size, total execution time, CPU usage, space usage, time to write the code, etc. If you want to be "efficient", you should know which one of these you're trying for.
I'd do something like this:
string url = "http://www.google.comhttp://www.google.com";
if (url.Length % 2 == 0)
{
string secondHalf = url.Substring(url.Length / 2);
if (url.StartsWith(secondHalf))
{
url = secondHalf;
}
}
Depending on the kinds of duplicates you need to remove, this may or may not work for you.

collect strings into list and use distinct, if your string has http address you can apply regex http:.+?(?=((http:)|($)) with RegexOptions.SingleLine
var distinctList = list.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();

Given you don't know the length of the string, you don't know if something is double and you don't know what is double:
string yourprimarystring = "http://www.google.comhttp://www.google.com";
int firstCharacter;
string temp;
for(int i = 0; i <= yourprimarystring.length; i++)
{
for(int j = 0; j <= yourprimarystring.length; j++)
{
string search = yourprimarystring.substring(i,j);
firstCharacter = yourprimaryString.IndexOf(search);
if(firstCharacter != -1)
{
temp = yourprimarystring.substring(0,firstCharacter) + yourprimarystring.substring(firstCharacter + j - i,yourprimarystring.length)
yourprimarystring = temp;
}
}
This itterates through all your elements, takes all out from first to last letter and searches for them like this:
ABCDA - searches for A finds A exludes A, thats the problem, you need to specify how long the duplication needs to be if you want to make it variable, but maybe my code helps you.

Related

PigLatin how can I strip punctuation from a string? And Then add it back?

Working on program for class call pig Latin. It works for what I need for class. It ask just to type in a phase to convert. But I notice if I type a sentence with punctuation at the end it will mess up the last word translation. Trying to figure out the best way to fix this. New at programming but I would need away for it to check last character in word to check for punctuations. Remove it before translation and then add it back. Not sure how to do that. Been reading about char.IsPunctuation. Plus not sure what part of my code I would had for that check.
public static string MakePigLatin(string str)
{
string[] words = str.Split(' ');
str = String.Empty;
for (int i = 0; i < words.Length; i++)
{
if (words[i].Length <= 1) continue;
string pigTrans = new String(words[i].ToCharArray());
pigTrans = pigTrans.Substring(1, pigTrans.Length - 1) + pigTrans.Substring(0, 1) + "ay ";
str += pigTrans;
}
return str.Trim();
}

The following should get you strings of letters for converting while passing through any non-letter characters that follow them.
Splitter based on Splitting a string in C#
public static string MakePigLatin(string str) {
MatchCollection matches = Regex.Matches(str, #"([a-zA-Z]*)([^a-zA-Z]*)");
StringBuilder result = new StringBuilder(str.Length * 2);
for (int i = 0; i < matches.Count; ++i) {
string pigTrans = matches[i].Groups[1].Captures[0].Value ?? string.Empty;
if (pigTrans.Length > 1) {
pigTrans = pigTrans.Substring(1) + pigTrans.Substring(0, 1) + "ay";
}
result.Append(pigTrans).Append(matches[i].Groups[2].Captures[0].Value);
}
return result.ToString();
}
The matches variable should contain all the match collections of 2 groups. The first group will be 0 or more letters to translate followed by a second group of 0 or more non-letters to pass through. The StringBuilder should be more memory efficient than concatenating System.String values. I gave it a starting allocation of double the initial string size just to avoid having to double the allocated space. If memory is tight, maybe 1.25 or 1.5 instead of 2 would be better, but you'd probably have to convert it back to int after. I took the length calculation off your Substring call because leaving it out grabs everything to the end of the string already.

C# WPF Separate characters from a string (starting from the back)

I have such a comic string.
www.asdsad.de/dsfdsf/sdfdsf=dsfdsfs?dsfsndfsajdn=sfdjasdhads=test.xlsx
I would like to get only the test.xlsx out.
So I wanted to say that I wanted to separate the string from behind.
That he he once the first = sign found me the string supplies the from the end to the = sign goes.
Whats the best way to do this?
Unfortunately, I would not know how I should do with SubString, since the length can always be different. But I know that in the end is what I need and the unnecessary with the first = Begin from behind

Yes, Substring will do, and there's no need to know the length:
string source = "www.asdsad.de/dsfdsf/sdfdsf=dsfdsfs?dsfsndfsajdn=sfdjasdhads=test.xlsx";
// starting from the last '=' up to the end of the string
string result = source.SubString(source.LastIndexOf("=") + 1);

Another option:
string source = "www.asdsad.de/dsfdsf/sdfdsf=dsfdsfs?dsfsndfsajdn=sfdjasdhads=test.xlsx";
Stack<char> sb = new Stack<char>();
for (var i = source.Length - 1; i > 0; i--)
{
if (source[i] == '=')
{
break;
}
sb.Push(source[i]);
}
var result = string.Concat(sb.ToArray());

Extracting data from plain text string

I am trying to process a report from a system which gives me the following code
000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}
I need to extract the values between the curly brackets {} and save them in to variables. I assume I will need to do this using regex or similar? I've really no idea where to start!! I'm using c# asp.net 4.
I need the following variables
param1 = 000
param2 = GEN
param3 = OK
param4 = 1 //Q
param5 = 1 //M
param6 = 002 //B
param7 = 3e5e65656-e5dd-45678-b785-a05656569e //I
I will name the params based on what they actually mean. Can anyone please help me here? I have tried to split based on spaces, but I get the other garbage with it!
Thanks for any pointers/help!

If the format is pretty constant, you can use .NET string processing methods to pull out the values, something along the lines of
string line =
"000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}";
int start = line.IndexOf('{');
int end = line.IndexOf('}');
string variablePart = line.Substring(start + 1, end - start);
string[] variables = variablePart.Split(' ');
foreach (string variable in variables)
{
string[] parts = variable.Split('=');
// parts[0] holds the variable name, parts[1] holds the value
}
Wrote this off the top of my head, so there may be an off-by-one error somewhere. Also, it would be advisable to add error checking e.g. to make sure the input string has both a { and a }.

I would suggest a regular expression for this type of work.
var objRegex = new System.Text.RegularExpressions.Regex(#"^(\d+)=\[([A-Z]+)\] ([A-Z]+) \{Q=(\d+) M=(\d+) B=(\d+) I=([a-z0-9\-]+)\}$");
var objMatch = objRegex.Match("000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}");
if (objMatch.Success)
{
Console.WriteLine(objMatch.Groups[1].ToString());
Console.WriteLine(objMatch.Groups[2].ToString());
Console.WriteLine(objMatch.Groups[3].ToString());
Console.WriteLine(objMatch.Groups[4].ToString());
Console.WriteLine(objMatch.Groups[5].ToString());
Console.WriteLine(objMatch.Groups[6].ToString());
Console.WriteLine(objMatch.Groups[7].ToString());
}
I've just tested this out and it works well for me.

Use a regular expression.
Quick and dirty attempt:
(?<ID1>[0-9]*)=\[(?<GEN>[a-zA-Z]*)\] OK {Q=(?<Q>[0-9]*) M=(?<M>[0-9]*) B=(?<B>[0-9]*) I=(?<I>[a-zA-Z0-9\-]*)}
This will generate named groups called ID1, GEN, Q, M, B and I.
Check out the MSDN docs for details on using Regular Expressions in C#.
You can use Regex Hero for quick C# regex testing.

You can use String.Split
string[] parts = s.Split(new string[] {"=[", "] ", " {Q=", " M=", " B=", " I=", "}"},
StringSplitOptions.None);

This solution breaks up your report code into segments and stores the desired values into an array.
The regular expression matches one report code segment at a time and stores the appropriate values in the "Parsed Report Code Array".
As your example implied, the first two code segments are treated differently than the ones after that. I made the assumption that it is always the first two segments that are processed differently.
private static string[] ParseReportCode(string reportCode) {
const int FIRST_VALUE_ONLY_SEGMENT = 3;
const int GRP_SEGMENT_NAME = 1;
const int GRP_SEGMENT_VALUE = 2;
Regex reportCodeSegmentPattern = new Regex(#"\s*([^\}\{=\s]+)(?:=\[?([^\s\]\}]+)\]?)?");
Match matchReportCodeSegment = reportCodeSegmentPattern.Match(reportCode);
List<string> parsedCodeSegmentElements = new List<string>();
int segmentCount = 0;
while (matchReportCodeSegment.Success) {
if (++segmentCount < FIRST_VALUE_ONLY_SEGMENT) {
string segmentName = matchReportCodeSegment.Groups[GRP_SEGMENT_NAME].Value;
parsedCodeSegmentElements.Add(segmentName);
}
string segmentValue = matchReportCodeSegment.Groups[GRP_SEGMENT_VALUE].Value;
if (segmentValue.Length > 0) parsedCodeSegmentElements.Add(segmentValue);
matchReportCodeSegment = matchReportCodeSegment.NextMatch();
}
return parsedCodeSegmentElements.ToArray();
}

Replacing / with regex

I have a question regarding replacing some characters with regex or any other best practice or efficient way.
Here is what I have as input, it has mostly the same form: A/ABC/N/ABC/123
The output should look like this: A_ABC_NABC123, basically the first 2 / should be changed to _ and the rest removed.
Of course i could do with some String.Replace. etc one by one, but I don't think it is a good way to do that. I search for a better solution.
So how to do it with Regex?

This will do it, although there may be a simpler way:
static class CustomReplacer
{
public static string Replace(string input)
{
int i = 0;
return Regex.Replace(input, "/", m => i++ < 2 ? "_" : "");
}
}
var replaced = CustomReplacer.Replace("A/ABC/N/ABC/123");
I've wrapped the code like this to make sure you don't accidentally the int variable.
Edit: There's also this overload which stops after a certain number of replacements, but you'd have to do it in two steps: replace the first two / with _, then replace the remaining / with nothing.

Try this:
string st = "A/ABC/N/ABC/123";
string [] arrStr = st.Split(new char[] { '/' });
st = string.Empty;
for (int i = 0; i < arrStr.Length; i++)
{
if (i < 2)
st += arrStr[i] + "_";
else
st += arrStr[i];
}

What's the most efficient way to format the following string?

I have a very simple question, and I shouldn't be hung up on this, but I am. Haha!
I have a string that I receive in the following format(s):
123
123456-D53
123455-4D
234234-4
123415
The desired output, post formatting, is:
123-455-444
123-455-55
123-455-5
or
123-455
The format is ultimately dependent upon the total number of characters in the original string..
I have several ideas of how to do this, but I keep thing there's a better way than string.Replace and concatenate...
Thanks for the suggestions..
Ian

Tanascius is right but I cant comment or upvote due to my lack of rep but if you want additional info on the string.format Ive found this helpful.
http://blog.stevex.net/string-formatting-in-csharp/

I assume this does not merely rely upon the inputs always being numeric? If so, I'm thinking of something like this
private string ApplyCustomFormat(string input)
{
StringBuilder builder = new StringBuilder(input.Replace("-", ""));
int index = 3;
while (index < builder.Length)
{
builder.Insert(index, "-");
index += 4;
}
return builder.ToString();
}

Here's a method that uses a combination of regular expressions and LINQ to extract groups of three letters at a time and then joins them together again. Note: it assumes that the input has already been validated. The validation can also be done with a regular expression.
string s = "123456-D53";
string[] groups = Regex.Matches(s, #"\w{1,3}")
.Cast<Match>()
.Select(match => match.Value)
.ToArray();
string result = string.Join("-", groups);
Result:
123-456-D53

EDIT: See history for old versions.
You could use char.IsDigit() for finding digits, only.
var output = new StringBuilder();
var digitCount = 0;
foreach( var c in input )
{
if( char.IsDigit( c ) )
{
output.Append( c );
digitCount++;
if( digitCount % 3 == 0 )
{
output.Append( "-" );
}
}
}
// Remove possible last -
return output.ToString().TrimEnd('-');
This code should fill from left to right (now I got it, first read, then code) ...
Sorry, I still can't test this right now.

Not the fastest, but easy on the eyes (ed: to read):
string Normalize(string value)
{
if (String.IsNullOrEmpty(value)) return value;
int appended = 0;
var builder = new StringBuilder(value.Length + value.Length/3);
for (int ii = 0; ii < value.Length; ++ii)
{
if (Char.IsLetterOrDigit(value[ii]))
{
builder.Append(value[ii]);
if ((++appended % 3) == 0) builder.Append('-');
}
}
return builder.ToString().TrimEnd('-');
}
Uses a guess to pre-allocate the StringBuilder's length. This will accept any Alphanumeric input with any amount of junk being added by the user, including excess whitespace.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to remove a duplicate set of characters in a string - c#

For example a string contains the following (the string is variable): http://www.google.comhttp://www.google.com What would be the most efficient way of removing the duplicate url here - e.g. output would be: http://www.google.com

collect strings into list and use distinct, if your string has http address you can apply regex http:.+?(?=((http:)|($)) with RegexOptions.SingleLine var distinctList = list.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();

Related

PigLatin how can I strip punctuation from a string? And Then add it back?

C# WPF Separate characters from a string (starting from the back)

Extracting data from plain text string

Replacing / with regex

What's the most efficient way to format the following string?

Categories

Resources