Reverse complement of a sequence

Reverse complement of a sequence - c#

Problem is below: (need to write ReserveComplemenet method in c#)
The reverse complement of a sequence is formed by exchanging all of its nucleobases with their base complements, and then reversing the resulting sequence. The reverse complement of a DNA sequence is formed by exchanging all instances of:
A with T
T with A
G with C
C with G
Then reversing the resulting sequence.
For example:
Given the DNA sequence AAGCT the reverse complement is AGCTT
This method, ReverseComplement(), must take the following parameter:
Reference to a DNA sequence
This method should return void and mutate the referenced DNA sequence to its reverse complement.
Currently, here is my code,
string result = z.Replace('A', 'T').Replace('T', 'A').Replace('G', 'C').Replace('C', 'G');
string before = (result);
return before;
I'm stuck and wondering how I do this? Any help would be greatly appreciated. When I run this I get AAGGA and not AGCTT

using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp8
{
class Program
{
static void Main(string[] args)
{
var dict = new Dictionary<char, char>()
{
['A'] = 'T',
['T'] = 'A',
['G'] = 'C',
['C'] = 'G',
};
var input = "AAGCT";
var output = string.Concat(input.Select(c => dict[c]).Reverse()); // AGCTT
Console.WriteLine(input);
Console.WriteLine(output);
}
}
}

When i run this i get AAGGA and not AGCTT
Because you are looking at it as a single replace, not multiple replaces:
z.Replace('A', 'T').Replace('T', 'A').Replace('G', 'C').Replace('C', 'G');
AAGCT
Replace('A', 'T')
TTGCT
Replace('T', 'A')
AAGCA
Replace('G', 'C')
AACCA
.Replace('C', 'G')
AAGGA
Instead what I would recommend is intermediary replace:
var z = "AAGCT";
var chars = z.Replace('A', '1')
.Replace('T', 'A')
.Replace('1', 'T')
.Replace('G', '2')
.Replace('C', 'G')
.Replace('2', 'C')
.Reverse()
.ToArray();
var result = new string(chars);
Console.WriteLine(result);
Yields:
AGCTT
DotNetFIddle Example
Now if you're doing this millions of times, you may want to consider using a StringBuilder instead.
Recommended reading: The Sad Tragedy of Micro-Optimization Theater

A little trick to Replace-version:
using System;
using System.Linq;
namespace DNA
{
public class Program
{
public static void Main()
{
var dna = "AAGCT";
var reversed = new String(dna
.ToLower()
.Replace('a', 'T')
.Replace('t', 'A')
.Replace('g', 'C')
.Replace('c', 'G')
.Reverse()
.ToArray());
Console.WriteLine(reversed);
}
}
}

Or good old StringBuilder:
using System;
using System.Text;
namespace DNA
{
public class Program
{
public static void Main()
{
var dna = "AAGCT";
var sb = new StringBuilder(dna.Length);
for(var i = dna.Length - 1; i >- 1; i--)
{
switch(dna[i])
{
case 'A':
sb.Append('T');
break;
case 'T':
sb.Append('A');
break;
case 'G':
sb.Append('C');
break;
case 'C':
sb.Append('G');
break;
}
}
var reversed = sb.ToString();
Console.WriteLine(reversed);
}
}
}

Instead of replacing each char, it's easier to implement using linq:
void Translate(ref string dna)
{
var map = new string[] {"AGTC", "TCAG"};
dna = string.Join("", dna.Select(c => map[1][map[0].IndexOf(c)]).Reverse());
}
You start with a string array that represents the mappings - then you select the mapped char for each char of the string, reverse the IEnumerable<char> you get from the Select, and use string.Join to convert it back to a string.
The code in the question first converts A to T, and then convert T to A, so everything that was A returns as an A, but also everything that was T returns as an A as well (same goes for G and C).
And also a non-linq solution based on a for loop and string builder (translation logic is the same):
void Translate(ref string dna)
{
var map = new string[] {"AGTC", "TCAG"};
var sb = new StringBuilder(dna.Length);
for(int i = dna.Length-1; i > -1; i--)
{
sb.Append(map[1][map[0].IndexOf(dna[i])]);
}
dna = sb.ToString();
}

Related

Extracting character-number pairs from a string

Character-values pairs are received continuously from serial port in the following format
h135v48s167
,where h has value 135, v has 48 and s has 167 (the numeric values ranges from 0 to 2000).
I am using if-else statement to perform specific actions based on values of h, v and s.
I tried regex as v(\d+) to get the value of v, which gives v48 as result.
How can i get the numeric value only?
I have to use regex 3 times to get the values of h, v and s. Can a single regexp statement works?
Is there any other better way without using regex?
Following is the section of the code where I am using this -
if (port.IsOpen) {
if (port.BytesToRead > 0) {
// port.WriteLine ("p");
string data = port.ReadExisting ();
if (!string.IsNullOrEmpty (data)) {
h = Regex.Match (data, #"h\d+").Value;
v = Regex.Match (data, #"v\d+").Value;
s = Regex.Match (data, #"s\d+").Value;
if (h > 150) {
// do something
}
if (v < 30) {
// do something
}
} else {
// default
}
}
}

Using Regex is fine. To allow for arbitrary letters, use [a-z] (or use [hvs] instead if you only want to catch these letters). You may capture both the letter and the number by parantheses and refer to them using the Match.Groups collection.
var data = "h135v48s167";
foreach (Match m in Regex.Matches(data, #"([a-z])(\d+)"))
{
var variable = m.Groups[1].Value[0];
var value = Convert.ToInt32(m.Groups[2].Value);
Console.WriteLine("{0}={1}", variable, value);
switch (variable)
{
case 'h':
// do something
break;
case 'v':
// do something
break;
case 's':
// do something
break;
}
}
gives:
h=135
v=48
s=167

Use Regex :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication11
{
class Program
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
string input = "h135v48s167";
string pattern = "h(?'h'[^v]+)v(?'v'[^s]+)s(?'s'.*)";
Match match = Regex.Match(input,pattern);
string h = match.Groups["h"].Value;
string v = match.Groups["v"].Value;
string s = match.Groups["s"].Value;
}

Split string into multiple alpha and numeric segments

I have a string like "ABCD232ERE44RR". How can I split it into separate segments by letters/numbers. I need:
Segment1: ABCD
Segment2: 232
Segment3: ERE
Segment4: 44
There could be any number of segments. I am thinking go Regex but don't understand how to write it properly

You can do it like this;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var substrings = Regex.Split("ABCD232ERE44RR", #"[^A-Z0-9]+|(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])");
Console.WriteLine(string.Join(",",substrings));
}
}
Output : ABCD,232,ERE,44,RR

I suggest thinking of this as finding matches to a target pattern rather than splitting into the parts you want. Splitting gives significance to the delimiters whereas matching gives significance to the tokens.
You can use Regex.Matches:
Searches the specified input string for all occurrences of a specified regular expression.
var matches = Regex.Matches("ABCD232ERE44RR", "[A-Z]+|[0-9]+");
foreach (Match match in matches) {
Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);
}

Try something like:
((A-Z)+(\d)*)+

If you decide not to use regex, you can always go the manual route.
const string str = "ABCD232ERE44RR1SGGSG3333GSDGSDG";
var result = new List<StringBuilder>
{
new StringBuilder()
};
char last = str[0];
result.Last().Append(last);
bool isLastNum = Char.IsNumber(last);
for (int i = 1; i < str.Length; i++)
{
char ch = str[i];
if (!((Char.IsDigit(ch) && isLastNum) || (Char.IsLetter(ch) && !isLastNum)))
{
result.Add(new StringBuilder());
}
result.Last().Append(ch);
last = ch;
isLastNum = Char.IsDigit(ch);
}

C# Sort Lithuanian Letters

I need sort letters from file as alphabet. How can i do this? I need ToString method. Now console prints:
ABCDEFGIJKLM...ĄČĖĮ...
I need to get this:
AĄBCČDEĘĖFGHIĮYJKLMNOPRSŠTUŲŪVZŽ
Thanks
static char[] Letters(string e) //
{
e = e.ToUpper();
char[] mas = new char[32];
int n = 0;
foreach (char r in e)
if (Array.IndexOf(mas, r) < 0)
if (Char.IsLetter(r))
mas[n++] = r;
Array.Resize(ref mas, n);
Array.Sort(mas);
return mas;
}

You can solve this by sorting the characters using a comparer that understands how to compare characters alphabetically (the default is ordinal comparison).
This implementation is very inefficient, because it converts chars to strings every time it does a compare, but it works:
public class CharComparer : IComparer<char>
{
readonly CultureInfo culture;
public CharComparer(CultureInfo culture)
{
this.culture = culture;
}
public int Compare(char x, char y)
{
return string.Compare(new string(x, 1), 0, new string(y, 1), 0, 1, false, culture);
}
}
(Note: The culture is not actually necessary here; it works without it. I just included it for completeness.)
Then you can use that with sort functions that accept anIComparer, such as Array.Sort():
static void Main()
{
var test = "AĄBCČDEĘĖFGHIĮYJKLMNOPRSŠTUŲŪVZŽ".ToCharArray();
Console.OutputEncoding = System.Text.Encoding.Unicode;
Array.Sort(test);
Console.WriteLine(new string(test)); // Wrong result using default char comparer.
Array.Sort(test, new CharComparer(CultureInfo.GetCultureInfo("lt"))); // Right result using string comparer.
Console.WriteLine(new string(test));
}
An alternative approach is to use an array of single-character strings rather than an array of chars, and sort that instead. This works because the sort functions will use the string comparer, which understands alphabetical order:
var test = "AĄBCČDEĘĖFGHIĮYJKLMNOPRSŠTUŲŪVZŽ".Select(x => new string(x, 1)).ToArray();
Console.OutputEncoding = System.Text.Encoding.Unicode;
Array.Sort(test); // Correct result because it uses the string comparer, which understands alphabetical order.
Console.WriteLine(string.Concat(test));
Or using Linq:
var test = "AĄBCČDEĘĖFGHIĮYJKLMNOPRSŠTUŲŪVZŽ".Select(x => new string(x, 1)).ToArray();
Console.OutputEncoding = System.Text.Encoding.Unicode;
// Correct result because it uses the string comparer, which understands alphabetical order.
test = test.OrderBy(x => x).ToArray();
Console.WriteLine(string.Concat(test));
Using an array of strings instead of an array of chars is probably more performant when sorting like this.

You could use following method to remove diacritics:
static string RemoveDiacritics(string text)
{
var normalizedString = text.Normalize(NormalizationForm.FormD);
var stringBuilder = new StringBuilder();
foreach (var c in normalizedString)
{
var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
if (unicodeCategory != UnicodeCategory.NonSpacingMark)
{
stringBuilder.Append(c);
}
}
return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}
Then you can use those chars for the ordering:
string e = "ABCDEFGIJKLM...ĄČĖĮ...";
var normalizedCharList = e.Zip(RemoveDiacritics(e), (chr, n) => new { chr, normValue = (int)n }).ToList();
var orderedChars = normalizedCharList.OrderBy(x => x.normValue).Select(x => x.chr);
string ordered = new String(orderedChars.ToArray());

How do I replace specific characters from a c# string?

if I have a string along the lines of: "user:jim;id:23;group:49st;"
how can I replace the group code (49st) with something else, so that it shows: "user:jim;id=23;group:76pm;"
sorry if the question is easy but I haven't found a specific answer, just cases different than mine.

You can use the index of "group" like this
string s = "user:jim;id:23;group:49st;";
string newS = s.Substring(0,s.IndexOf("group:") + 6);
string restOfS = s.IndexOf(";",s.IndexOf("group:") + 6) + 1 == s.Length
? ""
: s.Substring(s.IndexOf(";",s.IndexOf("group:") + 6) + 1);
newS += "76pm;";
s = newS + restOfS;
The line with the s = criteria ? true : false is essentially an if but it is put onto one line using a ternary operator.
Alternatively, if you know what text is there already and what it should be replaced with, you can just use a Replace
s = s.Replace("49st","76pm");
As an added precaution, if you are not always going to have this "group:" part in the string, to avoid errors put this inside an if which checks first
if(s.Contains("group:"))
{
//Code
}

Find the match using regex and replace it with new value in original string as mentioned below:
string str = "user:jim;id=23;group:49st;";
var match = Regex.Match(str, "group:.*;").ToString();
var newGroup = "group:76pm;";
str = str.Replace(match, newGroup);

This solution should work no matter where the group appears in the string:
string input = "user:jim;id:23;group:49st;";
string newGroup = "76pm";
string output = Regex.Replace(input, "(group:)([^;]*)", "${1}"+newGroup);

Here is a very generic method for splitting your input, changing items, then rejoining items to a string. It is not meant for single replacement in your example, but is meant to show how to split and join items in string.
I used Regex to split the items and then put results into a dictionary.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string pattern = "(?'name'[^:]):(?'value'.*)";
string input = "user:jim;id:23;group:49st";
Dictionary<string,string> dict = input.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptyEntries).Select(x => new
{
name = Regex.Match(x, pattern).Groups["name"].Value,
value = Regex.Match(x, pattern).Groups["value"].Value
}).GroupBy(x => x.name, y => y.value)
.ToDictionary(x => x.Key, y => y.FirstOrDefault());
dict["group"] = "76pm";
string output = string.Join(";",dict.AsEnumerable().Select(x => string.Join(":", new string[] {x.Key, x.Value})).ToArray());
}
}
}

That is just one way to do it. I hope it will help you.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace stringi
{
class Program
{
static void Main(string[] args)
{
//this is your original string
string s = "user:jim;id:23;group:49st";
//string with replace characters
string s2 = "76pm";
//convert string to char array so you can rewrite character
char[] c = s.ToCharArray(0, s.Length);
//asign characters to right place
c[21] = s2[0];
c[22] = s2[1];
c[23] = s2[2];
c[24] = s2[3];
//this is your new string
string new_s = new string(c);
//output your new string
Console.WriteLine(new_s);
Console.ReadLine();
}
}
}

string a = "user:jim;id:23;group:49st";
string b = a.Replace("49st", "76pm");
Console.Write(b);

Need algorithm to make simple program (sentence permutations)

I really cant understand how to make a simple algorithm on C# to solve my problem. So, we have a sentences:
{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}.
So, my program should make a lot of sentences looks like:
Hello my mate.
Hello my m8.
Hello my friend.
Hello my friends.
Hi my mate.
...
Hi-Hi my friends.
I know, there are a lot of programs which could do this, but i'd like to make it myself. Ofcourse, it should work with this too:
{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}.

Update I just wasn't too happy about my using the regexen to parse so simple input; yet I disliked the manual index manipulation jungle found in other answers.
So I replaced the tokenizing with a Enumerator-based scanner with two alternating token-states. This is more justified by the complexity of the input, and has a 'Linqy' feel to it (although it really isn't Linq). I have kept the original Regex based parser at the end of my post for interested readers.
This just had to be solved using Eric Lippert's/IanG's CartesianProduct Linq extension method, in which the core of the program becomes:
public static void Main(string[] args)
{
const string data = #"{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}.";
var pockets = Tokenize(data.GetEnumerator());
foreach (var result in CartesianProduct(pockets))
Console.WriteLine(string.Join("", result.ToArray()));
}
Using just two regexen (chunks and legs) to do the parsing into 'pockets', it becomes a matter of writing the CartesianProduct to the console :) Here is the full working code (.NET 3.5+):
using System;
using System.Text;
using System.Text.RegularExpressions;
using System.Linq;
using System.Collections.Generic;
namespace X
{
static class Y
{
private static bool ReadTill(this IEnumerator<char> input, string stopChars, Action<StringBuilder> action)
{
var sb = new StringBuilder();
try
{
while (input.MoveNext())
if (stopChars.Contains(input.Current))
return true;
else
sb.Append(input.Current);
} finally
{
action(sb);
}
return false;
}
private static IEnumerable<IEnumerable<string>> Tokenize(IEnumerator<char> input)
{
var result = new List<IEnumerable<string>>();
while(input.ReadTill("{", sb => result.Add(new [] { sb.ToString() })) &&
input.ReadTill("}", sb => result.Add(sb.ToString().Split('|'))))
{
// Console.WriteLine("Expected cumulative results: " + result.Select(a => a.Count()).Aggregate(1, (i,j) => i*j));
}
return result;
}
public static void Main(string[] args)
{
const string data = #"{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}.";
var pockets = Tokenize(data.GetEnumerator());
foreach (var result in CartesianProduct(pockets))
Console.WriteLine(string.Join("", result.ToArray()));
}
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item}));
}
}
}
Old Regex based parsing:
static readonly Regex chunks = new Regex(#"^(?<chunk>{.*?}|.*?(?={|$))+$", RegexOptions.Compiled);
static readonly Regex legs = new Regex(#"^{((?<alternative>.*?)[\|}])+(?<=})$", RegexOptions.Compiled);
private static IEnumerable<String> All(this Regex regex, string text, string group)
{
return !regex.IsMatch(text)
? new [] { text }
: regex.Match(text).Groups[group].Captures.Cast<Capture>().Select(c => c.Value);
}
public static void Main(string[] args)
{
const string data = #"{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}.";
var pockets = chunks.All(data, "chunk").Select(v => legs.All(v, "alternative"));
The rest is unchanged

Not sure what you need Linq (#user568262) or "simple" recursion (#Azad Salahli) for. Here's my take on it:
using System;
using System.Text;
class Program
{
static Random rng = new Random();
static string GetChoiceTemplatingResult(string t)
{
StringBuilder res = new StringBuilder();
for (int i = 0; i < t.Length; ++i)
if (t[i] == '{')
{
int j;
for (j = i + 1; j < t.Length; ++j)
if (t[j] == '}')
{
if (j - i < 1) continue;
var choices = t.Substring(i + 1, j - i - 1).Split('|');
res.Append(choices[rng.Next(choices.Length)]);
i = j;
break;
}
if (j == t.Length)
throw new InvalidOperationException("No matching } found.");
}
else
res.Append(t[i]);
return res.ToString();
}
static void Main(string[] args)
{
Console.WriteLine(GetChoiceTemplatingResult(
"{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}."));
}
}

As others have noted, you can solve your problem by splitting up the string into a sequence of sets, and then taking the Cartesian product of all of those sets. I wrote a bit about generating arbitrary Cartesial products here:
http://blogs.msdn.com/b/ericlippert/archive/2010/06/28/computing-a-cartesian-product-with-linq.aspx
An alternative approach, more powerful than that, is to declare a grammar for your language and then write a program that generates every string in that language. I wrote a long series of articles on how to do so. It starts here:
http://blogs.msdn.com/b/ericlippert/archive/2010/04/26/every-program-there-is-part-one.aspx

You can use a Tuple to hold index values of each collection.
For example, you would have something like:
List<string> Greetings = new List<string>()
{
"Hello",
"Hi",
"Hallo"
};
List<string> Targets = new List<string>()
{
"Mate",
"m8",
"friend",
"friends"
};
So now you have your greetings, let's create random numbers and fetch items.
static void Main(string[] args)
{
List<string> Greetings = new List<string>()
{
"Hello",
"Hi",
"Hallo"
};
List<string> Targets = new List<string>()
{
"Mate",
"m8",
"friend",
"friends"
};
var combinations = new List<Tuple<int, int>>();
Random random = new Random();
//Say you want 5 unique combinations.
while (combinations.Count < 6)
{
Tuple<int, int> tmpCombination = new Tuple<int, int>(random.Next(Greetings.Count), random.Next(Targets.Count));
if (!combinations.Contains(tmpCombination))
{
combinations.Add(tmpCombination);
}
}
foreach (var item in combinations)
{
Console.WriteLine("{0} my {1}", Greetings[item.Item1], Targets[item.Item2]);
}
Console.ReadKey();
}

This doesn't look trivial. You need to
1. do some parsing, to extract all the lists of words that you want to combine,
2. obtain all the actual combinations of these words (which is made harder by the fact that the number of lists you want to combine is not fixed)
3. rebuild the original sentence putting all the combinations in the place of the group they came from
part 1 (the parsing part) is probably the easiest: it could be done with a Regex like this
// get all the text within {} pairs
var pattern = #"\{(.*?)\}";
var query = "{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}.";
var matches = Regex.Matches(query, pattern);
// create a List of Lists
for(int i=0; i< matches.Count; i++)
{
var nl = matches[i].Groups[1].ToString().Split('|').ToList();
lists.Add(nl);
// build a "template" string like "{0} my {1}"
query = query.Replace(matches[i].Groups[1].ToString(), i.ToString());
}
for part 2 (taking a List of Lists and obtain all resulting combinations) you can refer to this answer
for part 3 (rebuilding your original sentence) you can now take the "template" string you have in query and use String.Format to substitute all the {0}, {1} .... with the combined values from part 2
// just one example,
// you will need to loop through all the combinations obtained from part 2
var OneResultingCombination = new List<string>() {"hi", "mate"};
var oneResult = string.Format(query, OneResultingCombination.ToArray());

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reverse complement of a sequence - c#

Related

Extracting character-number pairs from a string

Split string into multiple alpha and numeric segments

C# Sort Lithuanian Letters

How do I replace specific characters from a c# string?

Need algorithm to make simple program (sentence permutations)

Categories

Resources