Replace new lines symbols in long json string [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a file, that contains JSON string. Long string. Approx 700k symbols.
I'm trying to deserialize it.
But it contains symbols like \r and \n that should be replaces with comma ,.
I've tried to do it with Regex, but it stuck on it without error.
private static readonly Regex Pattern = new Regex("(\r\n|\r|\n)", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Pattern.Replace(dataString, ",");
Also tried to convert string into StringBuilder and use simple .Replace
private readonly IDictionary<string, string> replacements = new Dictionary<string, string> { { "\r\n", "," }, { "\r", "," }, { "\n", "," } };
foreach (var replacement in this.replacements)
{
dataStringBuilder.Replace(replacement.Key, replacement.Value);
}
The second case was better but till the time when the file becomes larger.
So now I receive stuck for both cases.
Are there any other recommended faster solutions?

You could use a naïve approach of manually copying the string, converting line breaks yourself. This enables you to iterate the underlying character array only once, and avoids costly reallocations of string/StringBuilder objects:
char[] converted = new char[input.Length];
int pos = 0;
bool lastWasCr = false;
foreach(char c in input)
{
if(c == '\r')
{
converted[pos++] = ',';
lastWasCr = true;
}
else
{
if(c == '\n')
{
if(!lastWasCr)
converted[pos++] = ',';
}
else
converted[pos++] = c;
lastWasCr = false;
}
}
string output = new string(converted, 0, pos);
This loop iterates over every character, and detects and replaces line breaks. Note that we have to keep track of recent carriage returns (\r), to avoid double , on Windows line breaks (\r\n).
I compared your two approaches with the code above, using a random 650kb text file, and performing 1000 iterations of each implementation.
Results:
Regex.Replace: 62.3233sec (this does not even include initialization like compiling the regex)
StringBuilder.Replace: 7.0622sec (fixed version as indicated in a comment to your question)
Char-wise loop with if statement: 2.3862sec

Related

Merge 2 Numeric strings Alternatively every Nth place [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 11 months ago.
This post was edited and submitted for review 11 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I have 2 Numeric strings with commas:
8,1,6,3,16,9,14,11,24,17,22,19
and
2,7,4,5,10,15,12,13,18,23,20,21
and I need to merge them Alternatively every Nth place
(for Example every 4th place to get)
8,1,2,7,6,3,4,5,16,9,10,15,14,11,12,13,24,17,18,23,22,19,20,21
I've already examined all recommended solutions but nothing worked for me.
Here's my current progress:
string result = "";
// For every index in the strings
for (int i = 0; i < JoinedWithComma1.Length || i < JoinedWithComma2.Length; i=i+2)
{
// First choose the ith character of the
// first string if it exists
if (i < JoinedWithComma1.Length)
result += JoinedWithComma1[i];
// Then choose the ith character of the
// second string if it exists
if (i < JoinedWithComma2.Length)
result += JoinedWithComma2[i];
}
Appreciate any assistance.
You can't rely on the length of the strings or select the "ith character" because not all "elements" (read: numbers) have the same number of characters. You should split the strings so you can get the elements out of the result arrays instead:
string JoinedWithComma1 = "8,1,6,3,16,9,14,11,24,17,22,19";
string JoinedWithComma2 = "2,7,4,5,10,15,12,13,18,23,20,21";
var split1 = JoinedWithComma1.Split(',');
var split2 = JoinedWithComma2.Split(',');
if (split1.Length != split2.Length)
{
// TODO: decide what you want to happen when the two strings
// have a different number of "elements".
throw new Exception("Oops!");
}
Then, you can easily write a for loop to merge the two lists:
var merged = new List<string>();
for (int i = 0; i < split1.Length; i += 2)
{
if (i + 1 < split1.Length)
{
merged.AddRange(new[] { split1[i], split1[i + 1],
split2[i], split2[i + 1] });
}
else
{
merged.AddRange(new[] { split1[i], split2[i] });
}
}
string result = string.Join(",", merged);
Console.WriteLine(
result); // 8,1,2,7,6,3,4,5,16,9,10,15,14,11,12,13,24,17,18,23,22,19,20,21
Try it online.
If you write a regular expression to get you a pair of numbers:
var r = new Regex(#"\d+,\d+");
You can break each string into a sequence of pairs:
var s1pairs = r.Matches(s1).Cast<Match>().Select(m => m.ToString());
var s2pairs = r.Matches(s2).Cast<Match>().Select(m => m.ToString());
And you can zip the sequences
var zipped = s1pairs.Zip(s2pairs,(a,b)=>a+","+b);
And join the bits together with commas
var result = string.Join(",", zipped);
How does it work?
The Regex matches any number of digits, followed by a comma, followed by any number of digits
In a string of
1,2,3,4,5,6
It matches 3 times:
1,2
3,4
5,6
Matches returns a MatchCollection containing all these matches. To be compatible with LINQ Select you need to Cast the MatchCollection to an IEnumerable<Match>. It is this way because MatchCollection predates the invention of IEnumerable<T> so it's enumerator returns objects that need casting. Once turned into an IEnumerable<Match> each match can be ToString'd by a Select, producing a sequence of strings that are pairs of numbers separated by comma. An s1pairs is effectively a collection of pairs of numbers:
new []{ "1,2", "3,4", "5,6" }
Repeat the same for string 2
Zip the sequences. As you might imagine from the name, Zip takes one from A then one from B then one from A then one from B, merging them like a zipper on clothing so two sequences of
new [] { "1,2", "3,4" }
new [] { "A,B", "C,D" }
When zipped end up as
new [] { "1,2,A,B", "3,4,C,D" }
And all that remains is to join it back together with a comma
"1,2,A,B,3,4,C,D"

c# find keywords in a string and remove it [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have got a list with keywords. And I coded a method that if a string contains keyword from list, the method must remove keyword from string. Here is the method:
private string RemoveFromList(string sentence)
{
var lists = new List<string>{ "ask-", "que-", "(app)", "(exe)", "(foo)" };
var control = lists.Any(sentence.Contains);
string result;
if (control)
{
var index = sentence.IndexOf(lists.FirstOrDefault(sentence.Contains)
?? throw new InvalidOperationException(), StringComparison.Ordinal);
result = index != -1 ? sentence.Remove(index) : sentence;
}
else
result = sentence;
return result;
}
var str = "ask- This is a sentence.";
Message.Box(RemoveFromList(str));
// It does not give to me: This is a sentence.
This method does not work properly. It does not remove the keyword from the string.
Using string.Replace is the simplest approach:
foreach (var word in lists)
{
sentence = sentence.Replace(word,"").Trim();
}
Although that will find the word in the middle of the string too. If you wanted to remove it only at the start you could use IndexOf check it's 0 and then take the string starting from word.Length using Substring. Or use StartsWith:
foreach (var word in lists)
{
if (sentence.StartsWith(word))
{
sentence = sentence.Substring(word.Length).Trim();
// break; // if only one
}
}
There are 2 options for you.
First of all the Remove usage is incorrect. You just want to remove the keyword. If u pass 1 argument to remove it will remove from that index till end. Pass the length of keyword as second arg to Remove.
s.Remove(index, len);
If string contains it than replace the occurrence of keyword with empty string
s.Replace("keyword", "");
Another option is you could create an extension since you already know what items to remove.
using System.Text.RegularExpressions;
public static string RemoveFromList(this string sentence)
{
new List<string>{ "ask-",
"que-",
"(app)",
"(exe)",
"(foo)" }.ForEach(name =>
{
sentence = Regex.Replace(sentence.Replace(name, string.Empty), " {2,}", " ");
});
return sentence;
}
Useage
var str = "ask- This is (app) a que- sentence.".RemoveFromList();
Note
I used Regex.Replace as it's possible you may have some blank spaces floating around after you remove the bad string/s, this helps ensure that doesn't happen.

How to check if a file line contains only the string searched? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Suppose I have a file with multiple line:
En;15;
Vu;
US;32;
I need to check if the string VU is contained so I did:
string text = #"En;15;
Vu;
US;32;"
var exist = text.Contains("Vu");
this will return true but I need to check also if the line of Vu contains only Vu or other contents as the other lines. How can I do this? Thanks.
UPDATE
if the line contains also other element should return false
Split your string into an array and that you will be able to separate all the lines into a separated indexes in the string array using the Split() method
with the separator '\n' that represents a new line:
static void Main(string[] args)
{
string stringfromfile = #"En;
15;
Vu;
US;
32;";
string[] ar = stringfromfile.Split('\n');
// remove ";" character and fix white space for safety
for (int i = 0; i < ar.Length; i++)
{
ar[i] = ar[i].Replace(";","").Trim();
}
if (ar.Contains("Vu"))
{
Console.WriteLine("TRUE");
}
else
{
Console.WriteLine("FALSE");
}
foreach (var itm in ar)
{
Console.WriteLine(itm);
}
Console.ReadLine();
}
Assuming that in your input a "newline" is "\n", and assuming that things in a line are separated by ";", then:
1) break the text into separate lines
2) for each line, break the line into pieces separated by ";"
3) for each broken-line, check each piece and see if Vu is there
4) ..and if it is, hey you found it
5) ..and if that broken-line had just 1 piece, hey, it was single Vu
Bits of code:
1)
#"text
that
has
lines".Split("\n") ==> array of 4 lines
2)
"linethat;has;pieces".Split(";") ===> array of 3 pieces
3) "for each" is a foreach loop. You will need one for lines, and one for pieces. One inside the other.
4) Split removes the separator, so ";" wont show up in a piece, so if(piece == "Vu")
5) "pieces" is array-of-strings, so if(pieces.Length==1) means that the line had a single piece
Now you have all bits, just use them properly.
You could try searching very simply for "\nVu\n" so it would look something little like:
var exist = text.Contains("\nVu\n");
This would find all cases except for the special cases of first and last line, and is more efficient than splitting the string into an array.
And as #quetzalcoatl said
..and first and last line can be covered by .StartsWith("Vu\n") and .EndsWith("\nVu")..

How to read from file and make a HashMap [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Updated : to make it clear
I have txt file looks like this
a [456545324553645,43456765676564,62644456374,65768475336846,...]
b [3445324553645,4546465676564,07674456374,0906847534657,...]
c [21245324553645,43456765676564,62644456374,6576847534657,...]
d [133426545324553645,43456765676564,62644456374,6576847534657,...]
f [1243545324553645,43456765676564,62644456374,6576847534657,...]
g [356545324553645,43456765676564,62644456374,6576847534657,...]
I want to read the file and make a HashMap
that mean I want to store char in String variable as a Key of Hashmap
and store the numbers in String[] as Value of Hash-map
var lines = File.ReadAllLines("filename.txt");
var results = lines.Select(line => line.Split(' '))
.Select(split => new { Character = split[0], Number = split[1] });
// this is your data, now you can play with it
string allChars = string.Join(string.Empty, results.Select(r => r.Character));
string[] allNumbers = results.Select(r => r.Number).ToArray();
You need to read each line of a file, split it in two, then add each part to wherever it needs to go.
string character = "";
string[] numbers; //to be calculated at later
var numberList = new List<string>() // for ease of adding values
using(var file = File.OpenText(pathToFile))
{
while (!file.EndOfStream)
{
var lineParts = file.ReadLine().Split(' '); //split line around space characters
character += lineParts[0];
numberList.Add(lineParts[1]);
}
}
numbers = numberList.ToArray();
There are a couple of things to point out here that are good practice.
We don't know how big the file is (it could be thousands of lines), so we avoid reading the whole thing at once, instead, only read as much as you need at a time, in this case, a single line.
We're not adding straight to the array. Because of the above, we can't easily work out how many lines there are going to be, so we can't say how big the array needs to be. Instead we add to a List, and turn it into an array later. If you don't need the array, you don't even have to do that: you can just work with the list.
The line character += lineParts[0] isn't ideal: it creates extra String objects which then have to be thrown away. Instead, we could use a StringBuilder:
var characterBuilder = new StringBuilder();
...
characterBuilder.Append(lineParts[0]);
...
character = characterBuilder.ToString();
This becomes more relevant as your file gets bigger.
Update
If you want to create a hashmap, you're better off creating that from the beginning:
var numbers = new Dictionary<string, string>();
using(var file = File.OpenText(pathToFile))
{
while (!file.EndOfStream)
{
var lineParts = file.ReadLine().Split(" ".ToCharArray(), 2); //split line around space characters
numbers[lineParts[0]] = lineParts[1];
}
}
You'll note that I'm using a different overload of string.Split. It takes an int that specifies the maximum number of parts to produce.

C# - split string [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have some problems with split and check string.
I need to split string, replace halfs and check is this the same as the second string.
example: first string = tokyo second string = koyto
soo... S = a+b = b+a
S - a = b and S - b = a
a and b is part of one string (S) and may have different long in this case a = to and b = koy
first I need to check string length - is the are different - then write Error - it's easy
the I thought that I can compare strings in ASCII (case sensitivity is not important) and it' could be ok but...
I can create string tooky which have got the same size in ASCII but is not created from split and invert parts of first string...
any ideas?
static void Main(string[] args)
{
string S = "tokyo";
string T = "kyoto";
if (S.Length == T.Length)
{
split string ?
}
else
Console.WriteLine("This two words are different. No result found.");
Console.Read();
}
I would suggest doing the comparisons with strings. You can use the String.ToLower() method to convert them both to lowercase for comparison.
I am not exactly sure what problem you are trying to solve is, but from what I understand you are trying to check if string S can be split into two substrings that can be rearranged to make string T.
To check this you will want something similar to the following
for (int i = 0; i < S.length; i++) {
string back = S.substring(i);
string front = S.substring(0,i);
if (T.equals(back + front))
result = true;
}
Hope this helps
If you want to compare equality of two collections you should consider using LINQ:
static void Main(string[] args)
{
string S = "tokyo";
string T = "kyoto";
if (S.Length == T.Length)
{
if (S.Intersect(T).Any())
{
Console.WriteLine("The Contents are the same");
Console.Read();
}
}
else
Console.WriteLine("This two words are diferent. No result found.");
Console.Read();
}

Categories