I am working on an application that was implemented using a SQLite database. I am currently in the process of adding the ability to use MSSQL as well. The complicated part is that it will need to be able to use either engine depending on the needs. There are a handful of different syntax differences that have to be accounted for. The biggest problem child I have come across is LIMIT vs TOP. I have written some logic to convert the SQLite statements into the proper format for MSSQL. My function for converting the LIMIT to TOP seems to be working, but it ended up being pretty ugly. I wanted to post it here and see if anyone had ideas for a cleaner method of completing this. I also wanted to see if anyone noticed any glaring issues that I have missed. The biggest problem I ran into is the possibility of nested select statements with possible LIMIT statements on them as well. I ended up pulling the statement apart into its individual parts, changing them from LIMIT to TOP, and then rebuilding the statement. There might even be an overall better way to do this that I am missing. Thanks ahead of time if you spend the time to take a look.
private static string ConvertLimitToTop(string commandText)
{
string processCommand = commandText;
int start = -1;
List<string> commandParts = new List<string>();
//Running through the string looking for nested statemets starting with (
for (int i = 0; i < processCommand.Length; i++)
{
//Any time we find a new open ( we want to start there
if (processCommand[i] == '(')
start = i;
//If we find a close ) we will grab the nested statment and replace it
if (processCommand[i] == ')')
{
//Grab the 3 parts of the string
string preString = processCommand.Substring(0, start);
string nestedCommand = processCommand.Substring(start, i - start + 1);
string postString = processCommand.Substring(i + 1);
//Add the nested command to the list
commandParts.Add(nestedCommand);
//Update the commandText replacing the nested command we removed with its index in the list
processCommand = preString + "{" + (commandParts.Count - 1) + "}" + postString;
//Go back the the beginning of the command and look for the next nested command
i = 0;
start = -1;
}
}
//If start isnt -1 that means we found an open ( without a closing )
if (start == -1)
{
//We want to add the final command the the list for processing too
commandParts.Add(processCommand);
//We're going to go through the command parts and replace the LIMIT
for (int i = 0; i < commandParts.Count; i++)
{
string command = commandParts[i];
Console.WriteLine(command);
//We need to find the where the LIMIT is and extact the number
int limitIndex = command.IndexOf("LIMIT");
if (limitIndex != -1)
{
int startIndex = limitIndex + 6;
//Assuming after the limit will be ), a space, or the end of the string
int endIndex = command.IndexOf(')', startIndex);
if (endIndex == -1)
endIndex = command.IndexOf(' ', startIndex);
if (endIndex == -1)
endIndex = command.Length - 1;
Console.WriteLine(startIndex);
Console.WriteLine(endIndex);
//Extract the number
string limitNumber = command.Substring(startIndex, endIndex - startIndex);
//Remove the LIMIT command. There should always be a space before so take that out too.
command = command.Remove(limitIndex - 1, endIndex - limitIndex + 1);
//Insert the top command with the number
command = command.Replace("SELECT", "SELECT TOP " + limitNumber);
//Update the list
commandParts[i] = command;
}
}
start = -1;
//We need to go through the commands in reverse order and reassemble the complete command
for (int i = 0; i < processCommand.Length; i++)
{
//If we find a { its a part of the command that needs to be replaced
if (processCommand[i] == '{')
start = i;
if (processCommand[i] == '}')
{
string startString = processCommand.Substring(0, start);
string midString = processCommand.Substring(start, i - start + 1);
string endString = processCommand.Substring(i + 1);
//Get the index of the command we need from the list
int strIndex = Int32.Parse(midString.Substring(1, midString.Length - 2));
processCommand = processCommand.Replace(midString, commandParts[strIndex]);
//Go back to the start and look for the next
i = 0;
start = -1;
}
}
commandText = processCommand;
}
else
{
LogManager.Write(LogLevel.Error, "Unmatched parentheses were found while processing a SQL command. Command: " + commandText);
}
return commandText;
}
Related
I think I am too dumb to solve this problem...
I have some formulas which need to be "translated" from one syntax to another.
Let's say I have a formula that goes like that (it's a simple one, others have many "Ceilings" in it):
string formulaString = "If([Param1] = 0, 1, Ceiling([Param2] / 0.55) * [Param3])";
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
My attempt is to split the fomulaString at "Ceiling(" so I am able to iterate through the string array and insert my string at the correct index (counting every "(" and ")" to get the right index)
What I have so far:
//splits correct, but loses "CEILING("
string[] parts = formulaString.Split(new[] { "CEILING(" }, StringSplitOptions.None);
//splits almost correct, "CEILING(" is in another group
string[] parts = Regex.Split(formulaString, #"(CEILING\()");
//splits almost every letter
string[] parts = Regex.Split(formulaString, #"(?=[(CEILING\()])");
When everything is done, I concat the string so I have my complete formula again.
What do I have to set as Regex pattern to achieve this sample? (Or any other method that will help me)
part1 = "If([Param1] = 0, 1, ";
part2 = "Ceiling([Param2] / 0.55) * [Param3])";
//part3 = next "CEILING(" in a longer formula and so on...
As I mention in a comment, you almost got it: (?=Ceiling). This is incomplete for your use case unfortunately.
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
Depending on your regex engine (for example JS) this works:
string[] parts = Regex.Split(formulaString, #"(?<=Ceiling\([^)]*(?=\)))");
string modifiedFormula = String.join("; 1", parts);
The regex
(?<=Ceiling\([^)]*(?=\)))
(?<= ) Positive lookbehind
Ceiling\( Search for literal "Ceiling("
[^)] Match any char which is not ")" ..
* .. 0 or more times
(?=\)) Positive lookahead for ")", effectively making us stop before the ")"
This regex is a zero-assertion, therefore nothing is lost and it will cut your strings before the last ")" in every "Ceiling()".
This solution would break whenever you have nested "Ceiling()". Then your only solution would be writing your own parser for the same reasons why you can't parse markup with regex.
Regex.Replace(formulaString, #"(?<=Ceiling\()(.*?)(?=\))","$1; 1");
Note: This will not work for nested "Ceilings", but it does for Ceiling(), It will also not work fir Ceiling(AnotherFunc(x)). For that you need something like:
Regex.Replace(formulaString, #"(?<=Ceiling\()((.*\((?>[^()]+|(?1))*\))*|[^\)]*)(\))","$1; 1$3");
but I could not get that to work with .NET, only in JavaScript.
This is my solution:
private string ConvertCeiling(string formula)
{
int ceilingsCount = formula.CountOccurences("Ceiling(");
int startIndex = 0;
int bracketCounter;
for (int i = 0; i < ceilingsCount; i++)
{
startIndex = formula.IndexOf("Ceiling(", startIndex);
bracketCounter = 0;
for (int j = 0; j < formula.Length; j++)
{
if (j < startIndex) continue;
var c = formula[j];
if (c == '(')
{
bracketCounter++;
}
if (c == ')')
{
bracketCounter--;
if (bracketCounter == 0)
{
// found end
formula = formula.Insert(j, "; 1");
startIndex++;
break;
}
}
}
}
return formula;
}
And CountOccurence:
public static int CountOccurences(this string value, string parameter)
{
int counter = 0;
int startIndex = 0;
int indexOfCeiling;
do
{
indexOfCeiling = value.IndexOf(parameter, startIndex);
if (indexOfCeiling < 0)
{
break;
}
else
{
startIndex = indexOfCeiling + 1;
counter++;
}
} while (true);
return counter;
}
Let's say I have a string like this one, left part is a word, right part is a collection of indices (single or range) used to reference furigana (phonetics) for kanjis in my word:
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす"
The pattern in detail:
word,<startIndex>(-<endIndex>):<furigana>
What would be the best way to achieve something like this (with a space in front of the kanji to mark which part is linked to the [furigana]):
子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Edit: (thanks for your comments guys)
Here is what I wrote so far:
static void Main(string[] args)
{
string myString = "ABCDEF,1:test;3:test2";
//Split Kanjis / Indices
string[] tokens = myString.Split(',');
//Extract furigana indices
string[] indices = tokens[1].Split(';');
//Dictionnary to store furigana indices
Dictionary<string, string> furiganaIndices = new Dictionary<string, string>();
//Collect
foreach (string index in indices)
{
string[] splitIndex = index.Split(':');
furiganaIndices.Add(splitIndex[0], splitIndex[1]);
}
//Processing
string result = tokens[0] + ",";
for (int i = 0; i < tokens[0].Length; i++)
{
string currentIndex = i.ToString();
if (furiganaIndices.ContainsKey(currentIndex)) //add [furigana]
{
string currentFurigana = furiganaIndices[currentIndex].ToString();
result = result + " " + tokens[0].ElementAt(i) + string.Format("[{0}]", currentFurigana);
}
else //nothing to add
{
result = result + tokens[0].ElementAt(i);
}
}
File.AppendAllText(#"D:\test.txt", result + Environment.NewLine);
}
Result:
ABCDEF,A B[test]C D[test2]EF
I struggle to find a way to process ranged indices:
string myString = "ABCDEF,1:test;2-3:test2";
Result : ABCDEF,A B[test] CD[test2]EF
I don't have anything against manually manipulating strings per se. But given that you seem to have a regular pattern describing the inputs, it seems to me that a solution that uses regex would be more maintainable and readable. So with that in mind, here's an example program that takes that approach:
class Program
{
private const string _kinvalidFormatException = "Invalid format for edit specification";
private static readonly Regex
regex1 = new Regex(#"(?<word>[^,]+),(?<edit>(?:\d+)(?:-(?:\d+))?:(?:[^;]+);?)+", RegexOptions.Compiled),
regex2 = new Regex(#"(?<start>\d+)(?:-(?<end>\d+))?:(?<furigana>[^;]+);?", RegexOptions.Compiled);
static void Main(string[] args)
{
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
string result = EditString(myString);
}
private static string EditString(string myString)
{
Match editsMatch = regex1.Match(myString);
if (!editsMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int ichCur = 0;
string input = editsMatch.Groups["word"].Value;
StringBuilder text = new StringBuilder();
foreach (Capture capture in editsMatch.Groups["edit"].Captures)
{
Match oneEditMatch = regex2.Match(capture.Value);
if (!oneEditMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int start, end;
if (!int.TryParse(oneEditMatch.Groups["start"].Value, out start))
{
throw new ArgumentException(_kinvalidFormatException);
}
Group endGroup = oneEditMatch.Groups["end"];
if (endGroup.Success)
{
if (!int.TryParse(endGroup.Value, out end))
{
throw new ArgumentException(_kinvalidFormatException);
}
}
else
{
end = start;
}
text.Append(input.Substring(ichCur, start - ichCur));
if (text.Length > 0)
{
text.Append(' ');
}
ichCur = end + 1;
text.Append(input.Substring(start, ichCur - start));
text.Append(string.Format("[{0}]", oneEditMatch.Groups["furigana"]));
}
if (ichCur < input.Length)
{
text.Append(input.Substring(ichCur));
}
return text.ToString();
}
}
Notes:
This implementation assumes that the edit specifications will be listed in order and won't overlap. It makes no attempt to validate that part of the input; depending on where you are getting your input from you may want to add that. If it's valid for the specifications to be listed out of order, you can also extend the above to first store the edits in a list and sort the list by the start index before actually editing the string. (In similar fashion to the way the other proposed answer works; though, why they are using a dictionary instead of a simple list to store the individual edits, I have no idea…that seems arbitrarily complicated to me.)
I included basic input validation, throwing exceptions where failures occur in the pattern matching. A more user-friendly implementation would add more specific information to each exception, describing what part of the input actually was invalid.
The Regex class actually has a Replace() method, which allows for complete customization. The above could have been implemented that way, using Replace() and a MatchEvaluator to provide the replacement text, instead of just appending text to a StringBuilder. Which way to do it is mostly a matter of preference, though the MatchEvaluator might be preferred if you have a need for more flexible implementation options (i.e. if the exact format of the result can vary).
If you do choose to use the other proposed answer, I strongly recommend you use StringBuilder instead of simply concatenating onto the results variable. For short strings it won't matter much, but you should get into the habit of always using StringBuilder when you have a loop that is incrementally adding onto a string value, because for long string the performance implications of using concatenation can be very negative.
This should do it (and even handle ranged indices), based on the formatting of the input string you have-
using System;
using System.Collections.Generic;
public class stringParser
{
private struct IndexElements
{
public int start;
public int end;
public string value;
}
public static void Main()
{
//input string
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
int wordIndexSplit = myString.IndexOf(',');
string word = myString.Substring(0,wordIndexSplit);
string indices = myString.Substring(wordIndexSplit + 1);
string[] eachIndex = indices.Split(';');
Dictionary<int,IndexElements> index = new Dictionary<int,IndexElements>();
string[] elements;
IndexElements e;
int dash;
int n = 0;
int last = -1;
string results = "";
foreach (string s in eachIndex)
{
e = new IndexElements();
elements = s.Split(':');
if (elements[0].Contains("-"))
{
dash = elements[0].IndexOf('-');
e.start = int.Parse(elements[0].Substring(0,dash));
e.end = int.Parse(elements[0].Substring(dash + 1));
}
else
{
e.start = int.Parse(elements[0]);
e.end = e.start;
}
e.value = elements[1];
index.Add(n,e);
n++;
}
//this is the part that takes the "setup" from the parts above and forms the result string
//loop through each of the "indices" parsed above
for (int i = 0; i < index.Count; i++)
{
//if this is the first iteration through the loop, and the first "index" does not start
//at position 0, add the beginning characters before its start
if (last == -1 && index[i].start > 0)
{
results += word.Substring(0,index[i].start);
}
//if this is not the first iteration through the loop, and the previous iteration did
//not stop at the position directly before the start of the current iteration, add
//the intermediary chracters
else if (last != -1 && last + 1 != index[i].start)
{
results += word.Substring(last + 1,index[i].start - (last + 1));
}
//add the space before the "index" match, the actual match, and then the formatted "index"
results += " " + word.Substring(index[i].start,(index[i].end - index[i].start) + 1)
+ "[" + index[i].value + "]";
//remember the position of the ending for the next iteration
last = index[i].end;
}
//if the last "index" did not stop at the end of the input string, add the remaining characters
if (index[index.Keys.Count - 1].end + 1 < word.Length)
{
results += word.Substring(index[index.Keys.Count-1].end + 1);
}
//trimming spaces that may be left behind
results = results.Trim();
Console.WriteLine("INPUT - " + myString);
Console.WriteLine("OUTPUT - " + results);
Console.Read();
}
}
input - 子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす
output - 子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Note that this should also work with characters the English alphabet if you wanted to use English instead-
input - iliketocodeverymuch,2:A;4-6:B;9-12:CDEFG
output - il i[A]k eto[B]co deve[CDEFG]rymuch
I am trying to extract information out of a string - a fortran formatting string to be specific. The string is formatted like:
F8.3, I5, 3(5X, 2(A20,F10.3)), 'XXX'
with formatting fields delimited by "," and formatting groups inside brackets, with the number in front of the brackets indicating how many consecutive times the formatting pattern is repeated. So, the string above expands to:
F8.3, I5, 5X, A20,F10.3, A20,F10.3, 5X, A20,F10.3, A20,F10.3, 5X, A20,F10.3, A20,F10.3, 'XXX'
I am trying to make something in C# that will expand a string that conforms to that pattern. I have started going about it with lots of switch and if statements, but am wondering if I am not going about it the wrong way?
I was basically wondering if some Regex wizzard thinks that Regular expressions can do this in one neat-fell swoop? I know nothing about regular expressions, but if this could solve my problem I am considering putting in some time to learn how to use them... on the other hand if regular expressions can't sort this out then I'd rather spend my time looking at another method.
This has to be doable with Regex :)
I've expanded my previous example and it test nicely with your example.
// regex to match the inner most patterns of n(X) and capture the values of n and X.
private static readonly Regex matcher = new Regex(#"(\d+)\(([^(]*?)\)", RegexOptions.None);
// create new string by repeating X n times, separated with ','
private static string Join(Match m)
{
var n = Convert.ToInt32(m.Groups[1].Value); // get value of n
var x = m.Groups[2].Value; // get value of X
return String.Join(",", Enumerable.Repeat(x, n));
}
// expand the string by recursively replacing the innermost values of n(X).
private static string Expand(string text)
{
var s = matcher.Replace(text, Join);
return (matcher.IsMatch(s)) ? Expand(s) : s;
}
// parse a string for occurenses of n(X) pattern and expand then.
// return the string as a tokenized array.
public static string[] Parse(string text)
{
// Check that the number of parantheses is even.
if (text.Sum(c => (c == '(' || c == ')') ? 1 : 0) % 2 == 1)
throw new ArgumentException("The string contains an odd number of parantheses.");
return Expand(text).Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
}
I would suggest using a recusive method like the example below( not tested ):
ResultData Parse(String value, ref Int32 index)
{
ResultData result = new ResultData();
Index startIndex = index; // Used to get substrings
while (index < value.Length)
{
Char current = value[index];
if (current == '(')
{
index++;
result.Add(Parse(value, ref index));
startIndex = index;
continue;
}
if (current == ')')
{
// Push last result
index++;
return result;
}
// Process all other chars here
}
// We can't find the closing bracket
throw new Exception("String is not valid");
}
You maybe need to modify some parts of the code, but this method have i used when writing a simple compiler. Although it's not completed, just a example.
Personally, I would suggest using a recursive function instead. Every time you hit an opening parenthesis, call the function again to parse that part. I'm not sure if you can use a regex to match a recursive data structure.
(Edit: Removed incorrect regex)
Ended up rewriting this today. It turns out that this can be done in one single method:
private static string ExpandBrackets(string Format)
{
int maxLevel = CountNesting(Format);
for (int currentLevel = maxLevel; currentLevel > 0; currentLevel--)
{
int level = 0;
int start = 0;
int end = 0;
for (int i = 0; i < Format.Length; i++)
{
char thisChar = Format[i];
switch (Format[i])
{
case '(':
level++;
if (level == currentLevel)
{
string group = string.Empty;
int repeat = 0;
/// Isolate the number of repeats if any
/// If there are 0 repeats the set to 1 so group will be replaced by itself with the brackets removed
for (int j = i - 1; j >= 0; j--)
{
char c = Format[j];
if (c == ',')
{
start = j + 1;
break;
}
if (char.IsDigit(c))
repeat = int.Parse(c + (repeat != 0 ? repeat.ToString() : string.Empty));
else
throw new Exception("Non-numeric character " + c + " found in front of the brackets");
}
if (repeat == 0)
repeat = 1;
/// Isolate the format group
/// Parse until the first closing bracket. Level is decremented as this effectively takes us down one level
for (int j = i + 1; j < Format.Length; j++)
{
char c = Format[j];
if (c == ')')
{
level--;
end = j;
break;
}
group += c;
}
/// Substitute the expanded group for the original group in the format string
/// If the group is empty then just remove it from the string
if (string.IsNullOrEmpty(group))
{
Format = Format.Remove(start - 1, end - start + 2);
i = start;
}
else
{
string repeatedGroup = RepeatString(group, repeat);
Format = Format.Remove(start, end - start + 1).Insert(start, repeatedGroup);
i = start + repeatedGroup.Length - 1;
}
}
break;
case ')':
level--;
break;
}
}
}
return Format;
}
CountNesting() returns the highest level of bracket nesting in the format statement, but could be passed in as a parameter to the method. RepeatString() just repeats a string the specified number of times and substitutes it for the bracketed group in the format string.
I'm using LINQ and returning a list to my Business Logic Layer. I'mtrying to change one of the values in the list (changing the 'star' rating to an image with the number of stars).
Although the counter (i) appears to be working, the FOR loop is not working correctly. The first time through it stops at the correct IF but then it pops out at the ELSE statement for everything and all values end up with "star0.png." It appears as though I'm not cycling through the list??? Thanks in advance!
for (int i = 0; i < ReviewList.Count; i++)
{
string serviceCode = ReviewList[i].SERVICE.SERVICE_DESC;
if (serviceCode == "*")
{
ReviewList[i].SERVICE.SERVICE_DESC = "star1.png";
}
else if (serviceCode == "**")
{
ReviewList[i].SERVICE.SERVICE_DESC = "star2.png";
}
else if (serviceCode == "***")
{
ReviewList[i].SERVICE.SERVICE_DESC = "star3.png";
}
else if (serviceCode == "****")
{
ReviewList[i].SERVICE.SERVICE_DESC = "star4.png";
}
else
{
ReviewList[i].SERVICE.SERVICE_DESC = "star0.png";
}
}
If all values end up at star0.png, then you are cycling through the list. The fact that the else statement is the only code being executed for each element suggests a logical error -- did you perhaps mean to do something like this?
string serviceCode = ReviewList[i].SERVICE.SERVICE_CODE;
I dont think its an issue of the for loop working properly... your syntax is good and as written will iterate ReviewList.Count # of times.
I would step through and verify the contents of ReviewList first.
Let me know what you find
If you know each item will consist of a number of stars, why not do this?:
for (int i = 0; i < ReviewList.Count; i++)
{
string serviceCode = ReviewList[i].SERVICE.SERVICE_DESC;
ReviewList[i].SERVICE.SERVICE_DESC = "star" + serviceCode.Length + ".png";
}
Protection on double pass and with else condition
for (int i = 0; i < ReviewList.Count; i++)
{
string serviceCode = ReviewList[i].SERVICE.SERVICE_DESC;
if(!serviceCode.Contains(".png")) { // once name set should not be modified
if(serviceCode.Contains("*"))
ReviewList[i].SERVICE.SERVICE_DESC = "star" + serviceCode.Length + ".png";
else
ReviewList[i].SERVICE.SERVICE_DESC = "star0.png";
}
}
alternate LINQ approach
ReviewList.ForEach(rs=>if(!rs.SERVICE.SERVICE_DESC.Contains(".png"))
{ rs.SERVICE.SERVICE_DESC =
"star" + rs.SERVICE.SERVICE_DESC.Length + ".png"});
I need to add functionality in my program so that any file imported it will find the text within the "" of the addTestingPageContentText method as seen below. The two values on each line will then be added to a datagridview which has 2 columns so first text in first column then second in the 2nd column. How would i go about Finding the "sometext" ?
addTestingPageContentText("Sometext", "Sometext");
addTestingPageContentText("Sometext2", "Sometext2");
... continues n number of times.
Neither fast nor efficient, but it's easier to understand for those new to regular expressions:
while (!endOfFile)
{
//get the next line of the file
string line = file.readLine();
EDIT: //Trim WhiteSpaces at start
line = line.Trim();
//check for your string
if (line.StartsWith("addTestingPageContentText"))
{
int start1;
int start2;
//get the first something by finding a "
for (start1 = 0; start1 < line.Length; start1++)
{
if (line.Substring(start1, 1) == '"'.ToString())
{
start1++;
break;
}
}
//get the end of the first something
for (start2 = start1; start2 < line.Length; start2++)
{
if (line.Substring(start2, 1) == '"'.ToString())
{
start2--;
break;
}
}
string sometext1 = line.Substring(start1, start2 - start1);
//get the second something by finding a "
for (start1 = start2 + 2; start1 < line.Length; start1++)
{
if (line.Substring(start1, 1) == '"'.ToString())
{
start1++;
break;
}
}
//get the end of the second something
for (start2 = start1; start2 < line.Length; start2++)
{
if (line.Substring(start2, 1) == '"'.ToString())
{
start2--;
break;
}
}
string sometext2 = line.Substring(start1, start2 - start1);
}
}
However I would seriously recommend going through some of the great tutorials out there on the internet. This is quite a good one
The expression "\"[^"]*\"" would find each...