I have a function that fixed non-printable characters in C# for JavaScript. But it works very slow! How to increase speed of this function?
private static string JsStringFixNonPrintable(string Source)
{
string Result = "";
for (int Position = 0; Position < Source.Length; ++Position)
{
int i = Position;
var CharCat = char.GetUnicodeCategory(Source, i);
if (Char.IsWhiteSpace(Source[i]) ||
CharCat == System.Globalization.UnicodeCategory.LineSeparator ||
CharCat == System.Globalization.UnicodeCategory.SpaceSeparator) { Result += " "; continue; }
if (Char.IsControl(Source[i]) && Source[i] != 10 && Source[i] != 13) continue;
Result += Source[i];
}
return Result;
}
I have recoded your snippet of code using StringBuilder class, with predefined buffer size... that is much faster than your sample.
private static string JsStringFixNonPrintable(string Source)
{
StringBuilder builder = new StringBuilder(Source.Length); // predefine size to be the same as input
for (int it = 0; it < Source.Length; ++it)
{
var ch = Source[it];
var CharCat = char.GetUnicodeCategory(Source, it);
if (Char.IsWhiteSpace(ch) ||
CharCat == System.Globalization.UnicodeCategory.LineSeparator ||
CharCat == System.Globalization.UnicodeCategory.SpaceSeparator) { builder.Append(' '); continue; }
if (Char.IsControl(ch) && ch != 10 && ch != 13) continue;
builder.Append(ch);
}
return builder.ToString();
}
Instead of concatenating to the string, try using System.Text.StringBuilder which internally maintains a character buffer and does not create a new object every time you append.
Example:
StringBuilder sb = new StringBuilder();
sb.Append('a');
sb.Append('b');
sb.Append('c');
string result = sb.ToString();
Console.WriteLine(result); // prints 'abc'
Use Stringbuilder
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx
and replace characters in-place, that should speed up things
Related
I got a list of files and directories List<string> pathes. Now I'd like to calculate the deepest common branch every path is sharing with each other.
We can assume that they all share a common path, but this is unknown in the beginning.
Let's say I have the following three entries:
C:/Hello/World/This/Is/An/Example/Bla.cs
C:/Hello/World/This/Is/Not/An/Example/
C:/Hello/Earth/Bla/Bla/Bla
This should get the result: C:/Hello/ as Earth is breaking this "chain" of subdirectories.
Second example:
C:/Hello/World/This/Is/An/Example/Bla.cs
C:/Hello/World/This/Is/Not/An/Example/
-> C:/Hello/World/This/Is/
How would you proceed? I tried to use string.split(#"/") and start with the first string and check if every part of this array is contained in the other strings. However, this would be a very expensive call as I'm iterating (list_of_entries)^list_of_entries. Is there any better solution available?
My current attempt would be something like the following (C# + LINQ):
public string CalculateCommonPath(IEnumerable<string> paths)
{
int minSlash = int.MaxValue;
string minPath = null;
foreach (var path in paths)
{
int splits = path.Split('\\').Count();
if (minSlash > splits)
{
minSlash = splits;
minPath = path;
}
}
if (minPath != null)
{
string[] splits = minPath.Split('\\');
for (int i = 0; i < minSlash; i++)
{
if (paths.Any(x => !x.StartsWith(splits[i])))
{
return i >= 0 ? splits.Take(i).ToString() : "";
}
}
}
return minPath;
}
A function to get the longest common prefix may look like this:
public static string GetLongestCommonPrefix(string[] s)
{
int k = s[0].Length;
for (int i = 1; i < s.Length; i++)
{
k = Math.Min(k, s[i].Length);
for (int j = 0; j < k; j++)
if (s[i][j] != s[0][j])
{
k = j;
break;
}
}
return s[0].Substring(0, k);
}
Then you may need to cut the prefix on the right hand. E.g. we want to return c:/dir instead of c:/dir/file for
c:/dir/file1
c:/dir/file2
You also may want to normalize the paths before processing. See Normalize directory names in C#.
I dont know whether this is the best performing solution (probably not), but it surely is very easy to implement.
Sort your list alphabetically
compare the first entry in that sorted list to the last in that list, character by character, and terminate when you find a difference (the value before the termination is the longest shared substring of both those strings)
Sample Fiddle
Sample code:
List<string> paths = new List<string>();
paths.Add(#"C:/Hello/World/This/Is/An/Example/Bla.cs");
paths.Add(#"C:/Hello/World/This/Is/Not/An/Example/");
paths.Add(#"C:/Hello/Earth/Bla/Bla/Bla");
List<string> sortedPaths = paths.OrderBy(s => s).ToList();
Console.WriteLine("Most common path here: {0}", sharedSubstring(sortedPaths[0], sortedPaths[sortedPaths.Count - 1]));
And that function of course:
public static string sharedSubstring(string string1, string string2)
{
string ret = string.Empty;
int index = 1;
while (string1.Substring(0, index) == string2.Substring(0, index))
{
ret = string1.Substring(0, index);
index++;
}
return ret;
} // returns an empty string if no common characters where found
First sort the list with the paths to inspect. Then you can split and compare the first and the last item - if they are same proceed to the next dimension until you find a difference.
So you just need to sort once and then inspect two items.
To return c:/dir for
c:/dir/file1
c:/dir/file2
I would code it this way:
public static string GetLongestCommonPrefix(params string[] s)
{
return GetLongestCommonPrefix((ICollection<string>)s);
}
public static string GetLongestCommonPrefix(ICollection<string> paths)
{
if (paths == null || paths.Count == 0)
return null;
if (paths.Count == 1)
return paths.First();
var allSplittedPaths = paths.Select(p => p.Split('\\')).ToList();
var min = allSplittedPaths.Min(a => a.Length);
var i = 0;
for (i = 0; i < min; i++)
{
var reference = allSplittedPaths[0][i];
if (allSplittedPaths.Any(a => !string.Equals(a[i], reference, StringComparison.OrdinalIgnoreCase)))
{
break;
}
}
return string.Join("\\", allSplittedPaths[0].Take(i));
}
And here are some tests for it:
[TestMethod]
public void GetLongestCommonPrefixTest()
{
var str1 = #"C:\dir\dir1\file1";
var str2 = #"C:\dir\dir1\file2";
var str3 = #"C:\dir\dir1\file3";
var str4 = #"C:\dir\dir2\file3";
var str5 = #"C:\dir\dir1\file1\file3";
var str6 = #"C:\dir\dir1\file1\file3";
var res = Utilities.GetLongestCommonPrefix(str1, str2, str3);
Assert.AreEqual(#"C:\dir\dir1", res);
var res2 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str4);
Assert.AreEqual(#"C:\dir", res2);
var res3 = Utilities.GetLongestCommonPrefix(str1, str2, str3, str5);
Assert.AreEqual(#"C:\dir\dir1", res3);
var res4 = Utilities.GetLongestCommonPrefix(str5, str6);
Assert.AreEqual(#"C:\dir\dir1\file1\file3", res4);
var res5 = Utilities.GetLongestCommonPrefix(str5);
Assert.AreEqual(str5, res5);
var res6 = Utilities.GetLongestCommonPrefix();
Assert.AreEqual(null, res6);
var res7 = Utilities.GetLongestCommonPrefix(null);
Assert.AreEqual(null, res7);
}
I would iterate over each character in the first path, comparing it with every character in every path (except the first) in the collection of paths:
public string FindCommonPath(List<string> paths)
{
string firstPath = paths[0];
bool same = true;
int i = 0;
string commonPath = string.Empty;
while (same && i < firstPath.Length)
{
for (int p = 1; p < paths.Count && same; p++)
{
same = firstPath[i] == paths[p][i];
}
if (same)
{
commonPath += firstPath[i];
}
i++;
}
return commonPath;
}
You could iterate through the list first to find the shortest path and possibly improve it slightly.
The function that gives you the longest common directory path with best possible complexity:
private static string GetCommonPath(IEnumerable<string> files)
{
// O(N, L) = N*L; N - number of strings, L - string length
// if the first and last path from alphabetic order matches, all paths in between match
string first = null;//smallest string
string last = null;//largest string
var comparer = StringComparer.InvariantCultureIgnoreCase;
// find smallest and largest string:
foreach (var file in files.Where(p => !string.IsNullOrWhiteSpace(p)))
{
if (last == null || comparer.Compare(file, last) > 0)
{
last = file;
}
if (first == null || comparer.Compare(file, first) < 0)
{
first = file;
}
}
if (first == null)
{
// the list is empty
return string.Empty;
}
if (first.Length > last.Length)
{
// first should not be longer
var tmp = first;
first = last;
last = tmp;
}
// get minimal length
var count = first.Length;
var found = string.Empty;
const char dirChar = '\\';
var sb = new StringBuilder(count);
for (var idx = 0; idx < count; idx++)
{
var current = first[idx];
var x = char.ToLowerInvariant(current);
var y = char.ToLowerInvariant(last[idx]);
if (x != y)
{
// first and last string character is different - break
return found;
}
sb.Append(current);
if (current == dirChar)
{
// end of dir character
found = sb.ToString();
}
}
if (last.Length >= count && last[count] == dirChar)
{
// whole first is common root:
return first;
}
return found;
}
This is considerably more optimized than splitting paths by slash and comparing them:
private static string FindCommonPath(string[] paths) {
var firstPath = paths[0];
var commonPathLength = firstPath.Length;
for (int i = 1; i < paths.Length; i++)
{
var otherPath = paths[i];
var pos = -1;
var checkpoint = -1;
while (true)
{
pos++;
if (pos == commonPathLength)
{
if (pos == otherPath.Length
|| (pos < otherPath.Length
&& (otherPath[pos] == '/' || otherPath[pos] == '\\')))
{
checkpoint = pos;
}
break;
}
if (pos == otherPath.Length)
{
if (pos == commonPathLength
|| (pos < commonPathLength
&& (firstPath[pos] == '/' || firstPath[pos] == '\\')))
{
checkpoint = pos;
}
break;
}
if ((firstPath[pos] == '/' || firstPath[pos] == '\\')
&& (otherPath[pos] == '/' || otherPath[pos] == '\\'))
{
checkpoint = pos;
continue;
}
var a = char.ToLowerInvariant(firstPath[pos]);
var b = char.ToLowerInvariant(otherPath[pos]);
if (a != b)
break;
}
if (checkpoint == 0 && (firstPath[0] == '/' || firstPath[0] == '\\'))
commonPathLength = 1;
else commonPathLength = checkpoint;
if (commonPathLength == -1 || commonPathLength == 0)
return "";
}
return firstPath.Substring(0, commonPathLength);
}
This question already has answers here:
How to count of sub-string occurrences? [duplicate]
(8 answers)
Closed 8 years ago.
I want to count, how often I have \r\r in a string variable.
For example:
string sampleString = "9zogl22n\r\r\nv4bv79gy\r\r\nkaz73ji8\r\r\nxw0w91qq\r\r\ns05jxqxx\r\r\nw08qsxh0\r\r\nuyggbaec\r\r\nu2izr6y6\r\r\n106iha5t\r";
The result would be in this example 8.
You can use a regular expression:
var res = Regex.Matches(s, "\r\r").Count;
Or a loop over the string:
var res = 0;
for (int i = 0; i < str.Length - 1; i++)
if(s[i] == '\r' && s[i + 1] == '\r')
res++;
Try this method:
public static int CountStringOccurrences(this string text, string pattern)
{
int count = 0;
int i = 0;
while ((i = text.IndexOf(pattern, i)) != -1)
{
i += pattern.Length;
count++;
}
return count;
}
Usage:
int doubleLinefeedCount = sampleString.CountStringOccurrences("\r\r");
You can use LINQ to select all positions where \r\r start and count them:
Enumerable.Range(0, s.Length).Where(idx => s.IndexOf(#"\r\r", idx)==idx).Count();
Note that "\r\r\r" will return 2 in above code...
You can use the trick of splitting the string with your lookup charachter \r\r:
(this is not really efficient _ we need to allocate 2 string arrays _ but I post it as a possible solution, particularly if the separated tokens are of any interest to you)
"9zogl22n\r\r\nv4bv79gy\r\r\nkaz73ji8\r\r\nxw0w91qq\r\r\ns05jxqxx\r\r\nw08qsxh0\r\r\nuyggbaec\r\r\nu2izr6y6\r\r\n106iha5t\r".Split(new string[] {"\r\r"}, StringSplitOptions.None).Length - 1)
You can write a simple extension method for that:
public static int LineFeedCount(this string source)
{
var chars = source.ToCharArray();
bool found = false;
int counter = 0;
int count = 0;
foreach(var ch in chars)
{
if (ch == '\r') found = true;
else found = false;
if (found) counter++;
else counter = 0;
if(counter != 0 && counter%2 == 0) count++;
}
return count;
}
Then use it:
int count = inputString.LineFeedCount();
I have a long string and I want to fit that in a small field. To achieve that, I break the string into lines on whitespace. The algorithm goes like this:
public static string BreakLine(string text, int maxCharsInLine)
{
int charsInLine = 0;
StringBuilder builder = new StringBuilder();
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
builder.Append(c);
charsInLine++;
if (charsInLine >= maxCharsInLine && char.IsWhiteSpace(c))
{
builder.AppendLine();
charsInLine = 0;
}
}
return builder.ToString();
}
But this breaks when there's a short word, followed by a longer word. "foo howcomputerwork" with a max length of 16 doesn't break, but I want it to. One thought I has was looking forward to see where the next whitespace occurs, but I'm not sure whether that would result in the fewest lines possible.
Enjoy!
public static string SplitToLines(string text, char[] splitOnCharacters, int maxStringLength)
{
var sb = new StringBuilder();
var index = 0;
while (text.Length > index)
{
// start a new line, unless we've just started
if (index != 0)
sb.AppendLine();
// get the next substring, else the rest of the string if remainder is shorter than `maxStringLength`
var splitAt = index + maxStringLength <= text.Length
? text.Substring(index, maxStringLength).LastIndexOfAny(splitOnCharacters)
: text.Length - index;
// if can't find split location, take `maxStringLength` characters
splitAt = (splitAt == -1) ? maxStringLength : splitAt;
// add result to collection & increment index
sb.Append(text.Substring(index, splitAt).Trim());
index += splitAt;
}
return sb.ToString();
}
Note that splitOnCharacters and maxStringLength could be saved in user settings area of the app.
Check the contents of the character before writing to the string builder and or it with the current count:
public static string BreakLine(string text, int maxCharsInLine)
{
int charsInLine = 0;
StringBuilder builder = new StringBuilder();
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (char.IsWhiteSpace(c) || charsInLine >= maxCharsInLine)
{
builder.AppendLine();
charsInLine = 0;
}
else
{
builder.Append(c);
charsInLine++;
}
}
return builder.ToString();
}
update a code a bit, the #dead.rabit goes to loop sometime.
public static string SplitToLines(string text,char[] splitanyOf, int maxStringLength)
{
var sb = new System.Text.StringBuilder();
var index = 0;
var loop = 0;
while (text.Length > index)
{
// start a new line, unless we've just started
if (loop != 0)
{
sb.AppendLine();
}
// get the next substring, else the rest of the string if remainder is shorter than `maxStringLength`
var splitAt = 0;
if (index + maxStringLength <= text.Length)
{
splitAt = text.Substring(index, maxStringLength).LastIndexOfAny(splitanyOf);
}
else
{
splitAt = text.Length - index;
}
// if can't find split location, take `maxStringLength` characters
if (splitAt == -1 || splitAt == 0)
{
splitAt = text.IndexOfAny(splitanyOf, maxStringLength);
}
// add result to collection & increment index
sb.Append(text.Substring(index, splitAt).Trim());
if(text.Length > splitAt)
{
text = text.Substring(splitAt + 1).Trim();
}
else
{
text = string.Empty;
}
loop = loop + 1;
}
return sb.ToString();
}
I am trying to read the data in a text file which is separated by commas. My problem, is that one of my data pieces has a comma within it. An example of what the text file looks like is:
a, b, "c, d", e, f.
I want to be able to take the comma between c and d and change it to a semicolon so that I can still use the string.Split() method.
using (StreamReader reader = new StreamReader("file.txt"))
{
string line;
while ((line = reader.ReadLine ()) != null) {
bool firstQuote = false;
for (int i = 0; i < line.Length; i++)
{
if (line [i] == '"' )
{
firstQuote = true;
}
else if (firstQuote == true)
{
if (line [i] == '"')
{
break;
}
if ((line [i] == ','))
{
line = line.Substring (0, i) + ";" + line.Substring (i + 1, (line.Length - 1) - i);
}
}
}
Console.WriteLine (line);
}
I am having a problem. Instead of producing
a, b, "c; d", e, f
it is producing
a, b, "c; d"; e; f
It is replacing all of the following commas with semicolons instead of just the comma in the quotes. Can anybody help me fix my existing code?
Basically if you find a closing " you recognize it as it was an opening quote.
Change the line:
firstQuote = true;
to
firstQuote = !firstQuote;
and it should work.
You need to reset firstquote to false after you hit the second quote.
else if (firstQuote == true) {
if (line [i] == '"') {
firstquote = false;
break;
}
Here is a simple application to get the required result
static void Main(string[] args)
{
String str = "a,b,\"c,d\",e,f,\"g,h\",i,j,k,l,\"m,n,o\"";
int firstQuoteIndex = 0;
int secodQuoteIndex = 0;
Console.WriteLine(str);
bool iteration = false;
//String manipulation
//if count is even then count/2 is the number of pairs of double quotes we are having
//so we have to traverse count/2 times.
int count = str.Count(s => s.Equals('"'));
if (count >= 2)
{
firstQuoteIndex = str.IndexOf("\"");
for (int i = 0; i < count / 2; i++)
{
if (iteration)
{
firstQuoteIndex = str.IndexOf("\"", firstQuoteIndex + 1);
}
secodQuoteIndex = str.IndexOf("\"", firstQuoteIndex + 1);
string temp = str.Substring(firstQuoteIndex + 1, secodQuoteIndex - (firstQuoteIndex + 1));
firstQuoteIndex = secodQuoteIndex + 1;
if (count / 2 > 1)
iteration = true;
string temp2= temp.Replace(',', ';');
str = str.Replace(temp, temp2);
Console.WriteLine(temp);
}
}
Console.WriteLine(str);
Console.ReadLine();
}
Please feel free to ask in case of doubt
string line = "a,b,mc,dm,e,f,mk,lm,g,h";
string result =replacestr(line, 'm', ',', ';');
public string replacestr(string line,char seperator,char oldchr,char newchr)
{
int cnt = 0;
StringBuilder b = new StringBuilder();
foreach (char chr in line)
{
if (cnt == 1 && chr == seperator)
{
b[b.ToString().LastIndexOf(oldchr)] = newchr;
b.Append(chr);
cnt = 0;
}
else
{
if (chr == seperator)
cnt = 1;
b.Append(chr);
}
}
return b.ToString();
}
What is simpliest way to get Line number from char position in String in C#?
(or get Position of line (first char in line) )
Is there any built-in function ? If there are no such function is it good solution to write extension like :
public static class StringExt {
public static int LineFromPos(this String S, int Pos) {
int Res = 1;
for (int i = 0; i <= Pos - 1; i++)
if (S[i] == '\n') Res++;
return Res;
}
public static int PosFromLine(this String S, int Pos) { .... }
}
?
Edited: Added method PosFromLine
A slight variation on Jan's suggestion, without creating a new string:
var lineNumber = input.Take(pos).Count(c => c == '\n') + 1;
Using Take limits the size of the input without having to copy the string data.
You should consider what you want the result to be if the given character is a line feed, by the way... as well as whether you want to handle "foo\rbar\rbaz" as three lines.
EDIT: To answer the new second part of the question, you could do something like:
var pos = input.Select((value, index) => new { value, index })
.Where(pair => pair.value == '\n')
.Select(pair => pair.index + 1)
.Take(line - 1)
.DefaultIfEmpty(1) // Handle line = 1
.Last();
I think that will work... but I'm not sure I wouldn't just write out a non-LINQ approach...
Count the number of newlines in the substringed input string.
var lineNumber = input.Substring(0, pos).Count(c=>c == '\n') + 1;
edit: and do a +1 because line numbers begin at 1 :-)
If you are going to call the function many times on the same long string, this class can be usefull. It caches the new line positions, so that later it can perform O(log (line breaks in string)) lookup for GetLine and O(1) for GetOffset.
public class LineBreakCounter
{
List<int> lineBreaks_ = new List<int>();
int length_;
public LineBreakCounter(string text)
{
if (text == null)
throw new ArgumentNullException(nameof(text));
length_ = text.Length;
for (int i = 0; i < text.Length; i++)
{
if (text[i] == '\n')
lineBreaks_.Add(i);
else if (text[i] == '\r' && i < text.Length - 1 && text[i + 1] == '\n')
lineBreaks_.Add(++i);
}
}
public int GetLine(int offset)
{
if (offset < 0 || offset > length_)
throw new ArgumentOutOfRangeException(nameof(offset));
var result = lineBreaks_.BinarySearch(offset);
if (result < 0)
return ~result;
else
return result;
}
public int Lines => lineBreaks_.Count + 1;
public int GetOffset(int line)
{
if (line < 0 || line >= Lines)
throw new ArgumentOutOfRangeException(nameof(line));
if (line == 0)
return 0;
return lineBreaks_[line - 1] + 1;
}
}
Here is my test case:
[TestMethod]
public void LineBreakCounter_ShouldFindLineBreaks()
{
var text = "Hello\nWorld!\r\n";
var counter = new LineBreakCounter(text);
Assert.AreEqual(0, counter.GetLine(0));
Assert.AreEqual(0, counter.GetLine(3));
Assert.AreEqual(0, counter.GetLine(5));
Assert.AreEqual(1, counter.GetLine(6));
Assert.AreEqual(1, counter.GetLine(8));
Assert.AreEqual(1, counter.GetLine(12));
Assert.AreEqual(1, counter.GetLine(13));
Assert.AreEqual(2, counter.GetLine(14));
Assert.AreEqual(3, counter.Lines);
Assert.AreEqual(0, counter.GetOffset(0));
Assert.AreEqual(6, counter.GetOffset(1));
Assert.AreEqual(14, counter.GetOffset(2));
}
For who is interested in javascript or a more iterative approach.
const {min} = Math
function lineAndColumnNumbersAt(str, pos) {
let line = 1, col = 1
const _pos = min(str.length, pos)
for (let i = 0; i < _pos; i++)
if (str[i] === '\n') {
line++
col = 1
} else
col++
return {line, col}
}
lineAndColumnNumbersAt('test\ntest\ntest', 8)
In ruby:
def line_index
source[0...position].count("\n")
end
def line_number
line_index + 1
end
def lines
source.lines
end
def line_source
lines[line_index]
end
def line_position
position - lines[0...line_index].map(&:size).sum
end