I'm dealing with some legacy data, where they store each record in one huge/large string (one string = one record)
In each string, they split the data in some sort of delimiters, but each of them actually defines a meaning, for example: \vToyota\cBlue\cRed\cWhite\s200mph\oAndrew\oJohn
\v means vehicle, \c is color, \s is speed \o is Owner... something like that
My task requires me to reformat the data so that if there are multiple fields of one characteristic, I have to rewrite it as: (for example) \vToyota\cBlue\c2Red\c3White\s200mph\oAndrew\o2John
Edited: Alright. #DarrenYoung's suggestions works! Now I have an array of vToyota cBlue cRed cWhite s200mph oAndrew oJohn. I tested on other data using the same method and it is working too. Now I just need help to find a way to rewrite the first letter of each string whenever they are repeated.
Thank you!
I found this an interesting little puzzle to see what I could do with LINQ. The following seems to work:
private string FixIt(string foo)
{
var newFoo = "\\" + string.Join("\\",
foo.Split(new[] {'\\'}, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(s => s[0],
(c, g) =>
{
var cnt = 0;
return g.Select(x => cnt++ == 0
? x
: x[0] + cnt.ToString() + x.Substring(1));
})
.SelectMany(g => g));
return newFoo;
}
Input: \vToyota\cBlue\cRed\cWhite\s200mph\oAndrew\oJohn
Output: \vToyota\cBlue\c2Red\c3White\s200mph\oAndrew\o2John
That SelectMany is a handy thing to remember.
Because I thought this question was interesting I wrote up a program to do what I believe to be a reasonable solution. I started with a few principle assumptions:
In "old data" situations you probably don't know every single option that is going to show up in the records. Consequently whatever approach is taken needs to quickly and easily accommodate new types of delimiters and tags. For that reason I did not use a string.split approach (even though this is easier to read). Instead all tokens are declared at the beginning of the file. Anything can be a token whether or not it has a "\" in front of it.
The solution needs to gracefully handle records that don't conform to the standards
The option of parsing integers for multiple records needs to be able to be disabled per record type. Speed, for example, doesn't (seem) to be able to appear multiple times per record. So, setting the value for speed to false in the "ALLOW_MULTIPLE" variable turns this parsing off, ensuring the correct output value.
In my solution I also created separate classes for readability and so the code could be quickly investigated. Although I would not suggest that this is production ready, the following should go a long ways towards solving the issue. Best of luck!
// Just paste the rest of this into a new console application to see it work!
public class Program
{
private static readonly List<string> TOKENS = new List<string> {#"\v", #"\c", #"\o", #"\s"};
private static readonly List<string> DISPLAY = new List<string> {"Vehicle", "Color", "Owner", "Speed"};
private static readonly List<bool> ALLOW_MULTIPLE = new List<bool> {false, true, true, false};
private class RecordEntry
{
public string Value { get; set; }
public int Index { get; set; }
public string DataType { get; set; }
public override string ToString() { return DataType + ": " + Value; }
}
private class ParsedRecord
{
private List<RecordEntry> entries = new List<RecordEntry>();
public List<RecordEntry> Entries { get { return entries; } }
}
public static void Main(string[] args)
{
// sample records (second has a \m which is ignored since it isn't a recognized token)
var records = new[] {#"\vToyota\cBlue\c2Red\c3White\s200mph\oAndrew\o2John",
#"\vChevy\c2Orange\cGreen\s50mph\o2Bob\mWhite"};
var parsedData = new List<ParsedRecord>();
foreach (var record in records)
{
// character by character parsing
var currentParseRecord = new ParsedRecord();
parsedData.Add(currentParseRecord);
var currentRecord = new StringBuilder(record);
var currentToken = new StringBuilder();
for (var parseIdx = 0; parseIdx < currentRecord.Length; parseIdx++)
{
currentToken.Append(currentRecord[parseIdx]);
var recordIdx = 0;
var index = TOKENS.IndexOf(currentToken.ToString());
if (index < 0) continue;
// current char is used up now (was part of the token)
parseIdx++;
if (ALLOW_MULTIPLE[index] && currentRecord.Length > parseIdx + 1)
{
// assuming less than 10 records max - if more, would need to pull multiple numeric values here
if (!Int32.TryParse(currentRecord[parseIdx] + "", out recordIdx)) recordIdx = 0;
else parseIdx++;
}
// find the next token or end of string
int valueLength = FindNextToken(currentRecord, parseIdx) - parseIdx;
if (valueLength <= 0) valueLength = currentRecord.Length - parseIdx;
currentParseRecord.Entries.Add(new RecordEntry
{
DataType = DISPLAY[index],
Index = recordIdx,
Value = currentRecord.ToString(parseIdx, valueLength)
});
parseIdx += valueLength - 1;
currentToken.Clear();
}
}
}
private static int FindNextToken(StringBuilder value, int currentIndex)
{
for (var searchIdx = currentIndex; searchIdx < value.Length; searchIdx++) {
if (TOKENS.Any(checkToken => value.Length > searchIdx + checkToken.Length &&
value.ToString(searchIdx, checkToken.Length) == checkToken)) {
return searchIdx;
}
}
return -1;
}
}
Related
I have a string for example
string text = "xfoofoobarbar fooxxfoo barxxxfoo";
This string contains 5x foo which is the longest, most appearing repeated sub string with at least 2 characters within this string so it's my desired result.
bar appears only 3x so it's not the mostly appearing sub string
oo is also 5x within the string but foo is longer - so foo is to prefer
XababaY would result into ab which exists 2x (no overlapping, 2x ba is ignored because ab comes first)
XaaaaaaaY would result into aa because aa appears 3 times and it has the most repetion.
I would love to show some approaches what I've tried so far but I have honestly no idea where to start. Linq? RegEx?
A hint/approach into the right direction would help me too.
I would say the first place to start here is to generate a list of all the possible substrings from the input of length 2 to the length of the input:
string text = "xfoofoobarbar fooxxfoo barxxxfoo";
var allSubstrings = Enumerable.Range(2,text.Length)
.ToDictionary(k => k,v => FindSubStrings(text,v));
...
IEnumerable<string> FindSubStrings(string input, int length)
{
for(var i=0;i<input.Length-length;i++)
{
yield return input.Substring(i,length);
}
}
Live example: http://rextester.com/ZUR68480
From there it should be as simple as grouping by the substring to get a count, and ordering the result appropriately. But your requirements seem to pick and choose between "longest length" and "most occurrences", you cant have both!
Here is my full implementation, which I should point out chooses xfoo as the winner at present.
public static void Main(string[] args)
{
string text = "xfoofoobarbar fooxxfoo barxxxfoo";
var allSubstrings = Enumerable.Range(2,text.Length-2)
.Select(x => {
var longestSub = FindSubStrings(text,x).GroupBy(y => y).OrderByDescending(y => y.Count()).FirstOrDefault();
return new Substrings {
Length = x,
Count = longestSub.Count(),
Value = longestSub.Key
};
});
foreach(var item in allSubstrings)
{
Console.WriteLine(item.Length + ":" + item.Count + ":" + item.Value);
}
var best = allSubstrings.Where(x => x.Count>1).OrderByDescending(x => x.Length).ThenByDescending(x => x.Count).First();
Console.WriteLine("Longest, most frequest substring is " + best.Value);
}
public class Substrings
{
public int Length{get;set;}
public int Count{get;set;}
public string Value{get;set;}
}
private static IEnumerable<string> FindSubStrings(string input, int length)
{
for(var i=0;i<input.Length-length;i++)
{
yield return input.Substring(i,length);
}
}
Live example: http://rextester.com/RJNP55827
Using the standard template I managed to make a custom highlighter which turns all occurrences of the string "Archive?????Key" (where ???? is any collection of characters that are allowed in variable names) pink. However what I would really like is for the "Archive" and "Key" portions to become pink and for the "????" portion to become maroon. As far as I understand VSIX highlighters (and I really don't) this means defining two ClassificationFormatDefinitions, but every time I try I just break the project.
My GetClassificationSpans method (which is the only significant deviation from the standard template) looks like:
public IList<ClassificationSpan> GetClassificationSpans(SnapshotSpan span)
{
List<ClassificationSpan> spans = new List<ClassificationSpan>();
string text = span.GetText();
int idx0 = 0;
int idx1;
while (true)
{
idx0 = text.IndexOf(keyPrefix, idx0);
if (idx0 < 0)
break;
idx1 = text.IndexOf(keySuffix, idx0 + 6);
if (idx1 < 0)
break;
// TODO: make sure the prefix and suffix are part of the same object identifier.
string name = text.Substring(idx0 + lengthPrefix, idx1 - idx0 - lengthPrefix);
string full = text.Substring(idx0, idx1 - idx0 + keySuffix.Length);
SnapshotSpan span0 = new SnapshotSpan(span.Start + idx0, idx1 - idx0 + lengthSuffix);
SnapshotSpan span1 = new SnapshotSpan(span.Start + idx0 + lengthPrefix, idx1 - idx0 - lengthPrefix);
SnapshotSpan span2 = new SnapshotSpan(span.Start + idx1, lengthSuffix);
spans.Add(new ClassificationSpan(span0, classificationType));
spans.Add(new ClassificationSpan(span1, classificationType)); // I'd like to assign a different IClassificationType to this span.
spans.Add(new ClassificationSpan(span2, classificationType));
idx0 = idx1 + 5;
}
return spans;
}
And span1 is where I want to assign a different style. I do not understand how the Classifier, Format, Provider, and Definition classes needed to do this one (!) thing relate to each other and which ones can be made aware of multiple styles.
The templates are OK for getting started, but usually it's simpler to reimplement everything more directly once you know what direction you're going in.
Here's how all the pieces fit together:
A classifier (really, an IClassificationTag tagger) yields classification tag-spans for a given section of a text buffer on demand.
Classification tag-spans consist of the span in the buffer that the tag applies to, and the classification tag itself. The classification tag simply specifies a classification type to apply.
Classification types are used to relate tags of that classification to a given format.
Formats (specifically, ClassificationFormatDefinitions) are exported via MEF (as EditorFormatDefinitions) so that VS can discover them and use them to colour spans that have the associated classification type. They also (optionally) appear in the Fonts & Colors options.
A classifier provider is exported via MEF in order for VS to discover it; it gives VS a means of instantiating your classifier for each open buffer (and thus discovering the tags in it).
So, what you're after is code that defines and exports two classification format definitions associated to two classification types, respectively. Then your classifier needs to produce tags of both types accordingly. Here's an example (untested):
public static class Classifications
{
// These are the strings that will be used to form the classification types
// and bind those types to formats
public const string ArchiveKey = "MyProject/ArchiveKey";
public const string ArchiveKeyVar = "MyProject/ArchiveKeyVar";
// These MEF exports define the types themselves
[Export]
[Name(ArchiveKey)]
private static ClassificationTypeDefinition ArchiveKeyType = null;
[Export]
[Name(ArchiveKeyVar)]
private static ClassificationTypeDefinition ArchiveKeyVarType = null;
// These are the format definitions that specify how things will look
[Export(typeof(EditorFormatDefinition))]
[ClassificationType(ClassificationTypeNames = ArchiveKey)]
[UserVisible(true)] // Controls whether it appears in Fonts & Colors options for user configuration
[Name(ArchiveKey)] // This could be anything but I like to reuse the classification type name
[Order(After = Priority.Default, Before = Priority.High)] // Optionally include this attribute if your classification should
// take precedence over some of the builtin ones like keywords
public sealed class ArchiveKeyFormatDefinition : ClassificationFormatDefinition
{
public ArchiveKeyFormatDefinition()
{
ForegroundColor = Color.FromRgb(0xFF, 0x69, 0xB4); // pink!
DisplayName = "This will display in Fonts & Colors";
}
}
[Export(typeof(EditorFormatDefinition))]
[ClassificationType(ClassificationTypeNames = ArchiveKeyVar)]
[UserVisible(true)]
[Name(ArchiveKeyVar)]
[Order(After = Priority.Default, Before = Priority.High)]
public sealed class ArchiveKeyVarFormatDefinition : ClassificationFormatDefinition
{
public ArchiveKeyVarFormatDefinition()
{
ForegroundColor = Color.FromRgb(0xB0, 0x30, 0x60); // maroon
DisplayName = "This too will display in Fonts & Colors";
}
}
}
The provider:
[Export(typeof(ITaggerProvider))]
[ContentType("text")] // or whatever content type your tagger applies to
[TagType(typeof(ClassificationTag))]
public class ArchiveKeyClassifierProvider : ITaggerProvider
{
[Import]
public IClassificationTypeRegistryService ClassificationTypeRegistry { get; set; }
public ITagger<T> CreateTagger<T>(ITextBuffer buffer) where T : ITag
{
return buffer.Properties.GetOrCreateSingletonProperty(() =>
new ArchiveKeyClassifier(buffer, ClassificationTypeRegistry)) as ITagger<T>;
}
}
Finally, the tagger itself:
public class ArchiveKeyClassifier : ITagger<ClassificationTag>
{
public event EventHandler<SnapshotSpanEventArgs> TagsChanged;
private Dictionary<string, ClassificationTag> _tags;
public ArchiveKeyClassifier(ITextBuffer subjectBuffer, IClassificationTypeRegistryService classificationRegistry)
{
// Build the tags that correspond to each of the possible classifications
_tags = new Dictionary<string, ClassificationTag> {
{ Classifications.ArchiveKey, BuildTag(classificationRegistry, Classifications.ArchiveKey) },
{ Classifications.ArchiveKeyVar, BuildTag(classificationRegistry, Classifications.ArchiveKeyVar) }
};
}
public IEnumerable<ITagSpan<ClassificationTag>> GetTags(NormalizedSnapshotSpanCollection spans)
{
if (spans.Count == 0)
yield break;
foreach (var span in spans) {
if (span.IsEmpty)
continue;
foreach (var identSpan in LexIdentifiers(span)) {
var ident = identSpan.GetText();
if (!ident.StartsWith("Archive") || !ident.EndsWith("Key"))
continue;
var varSpan = new SnapshotSpan(
identSpan.Start + "Archive".Length,
identSpan.End - "Key".Length);
yield return new TagSpan<ClassificationTag>(new SnapshotSpan(identSpan.Start, varSpan.Start), _tags[Classifications.ArchiveKey]);
yield return new TagSpan<ClassificationTag>(varSpan, _tags[Classifications.ArchiveKeyVar]);
yield return new TagSpan<ClassificationTag>(new SnapshotSpan(varSpan.End, identSpan.End), _tags[Classifications.ArchiveKey]);
}
}
}
private static IEnumerable<SnapshotSpan> LexIdentifiers(SnapshotSpan span)
{
// Tokenize the string into identifiers and numbers, returning only the identifiers
var s = span.GetText();
for (int i = 0; i < s.Length; ) {
if (char.IsLetter(s[i])) {
var start = i;
for (++i; i < s.Length && IsTokenChar(s[i]); ++i);
yield return new SnapshotSpan(span.Start + start, i - start);
continue;
}
if (char.IsDigit(s[i])) {
for (++i; i < s.Length && IsTokenChar(s[i]); ++i);
continue;
}
++i;
}
}
private static bool IsTokenChar(char c)
{
return char.IsLetterOrDigit(c) || c == '_';
}
private static ClassificationTag BuildTag(IClassificationTypeRegistryService classificationRegistry, string typeName)
{
return new ClassificationTag(classificationRegistry.GetClassificationType(typeName));
}
}
One more note: In order to accelerate startup, VS keeps a cache of MEF exports. However, this cache is often not invalidated when it should be. Additionally, if you change the default colour of an existing classification format definition, there's a good chance your change won't get picked up because VS saves the previous values in the registry. To mitigate this, it's best to run a batch script in between compiles when anything MEF- or format-related changes to clear things. Here's an example for VS2013 and the Exp root suffix (used by default when testing VSIXes):
#echo off
del "%LOCALAPPDATA%\Microsoft\VisualStudio\12.0Exp\ComponentModelCache\Microsoft.VisualStudio.Default.cache" 2> nul
rmdir /S /Q "%LOCALAPPDATA%\Microsoft\VisualStudio\12.0Exp\ComponentModelCache" 2> nul
reg delete HKCU\Software\Microsoft\VisualStudio\12.0Exp\FontAndColors\Cache\{75A05685-00A8-4DED-BAE5-E7A50BFA929A} /f
I use this method called SearchConsequences to iterate through List<ValuesEO> of objects and perform some tasks, for getting values of particular fields, according to applied rules. I want to somehow simplify this code.
I want to switch (replace) everywhere in code the expression ValuesEO[i].powerR for other ValuesEO[i].otherField in the whole block of code.
At this time I do this just by block coping and changing it manually. So lets say, at the end, I have 5 blocks of really similar code blocks in this method. The only difference is in ValuesEO[i].otherField than ValuesEO[i].otherField2 ValuesEO[i].otherField3 ... and so on.
I don't like that block coping.
public Dictionary<Consequence,Cause> SearchConsequences(List<ResultsCatcher> smallTable, int n, ConnectHYSYS obj, int keyP, int keyR)//for one stream for one parameter
{
double threshold = 0.005;
Dictionary<Consequence,Cause> collection = new Dictionary<Consequence,Cause>();
//search in ValesE for each energy stream, for powerR
for (int i = 0; i < smallTable[n].ValuesE.Count; i++)
{
//sort the smallTable
smallTable.Sort((x, y) => x.ValuesE[i].powerR.CompareTo(y.ValuesE[i].powerR));
//get the index of first occurrence of powerR >= threshold, if there is nothing bigger than threshold, index is null
var tagged = smallTable.Select((item, ii) => new { Item = item, Index = (int?)ii });
int? index = (from pair in tagged
where pair.Item.ValuesE[i].powerR >= threshold
select pair.Index).FirstOrDefault();
//get needed information
if (index != null)
{
int id = Convert.ToInt16(index);
double newValue = smallTable[id].ValuesE[i].power;
double newValueR = smallTable[id].ValuesE[i].powerR;
TypeOfValue kindOf = TypeOfValue.power;
Consequence oneConsequence = new Consequence(obj.EnergyStreamsList[i], newValue, newValueR, kindOf);
Cause oneCause = new Cause();
oneCause.GetTableHeader(smallTable[id]);
collection.Add(oneConsequence,oneCause);
}
}
}
Maybe it is easy to accomplish that and somewhere this problem is discussed.
But I really even don't know how to google it.
This is a ready made program for you that demonstrates how to move your criteria/property selection outside your examination function. Have a look to see how fields criterias are matched.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Management;
namespace Test
{
class Program
{
public class PowerValues
{
public double power;
public double powerR;
public double lightbulbs;
public double lightbulbsR;
}
public static void DoSomething(IEnumerable<PowerValues> powerValues, Func<PowerValues, double> criteria, double treshhold)
{
var flaggedElements = powerValues.Where(e => criteria(e) > treshhold);
foreach (var flagged in flaggedElements)
{
Console.WriteLine("Value flagged: {0}", criteria(flagged));
}
}
public static void Main(string[] args)
{
List<PowerValues> powerValues = new List<PowerValues>();
powerValues.Add(new PowerValues(){power=10, powerR=0.002, lightbulbs = 2, lightbulbsR = 2.006});
powerValues.Add(new PowerValues(){power=5, powerR=0.004, lightbulbs = 4, lightbulbsR = 2.09});
powerValues.Add(new PowerValues(){power=6, powerR=0.003, lightbulbs = 3, lightbulbsR = 2.016});
Console.WriteLine("Power matching criteria . . . ");
DoSomething(powerValues, (e) => e.powerR, 0.003);
Console.WriteLine("Lightbulbs matching criteria . . . ");
DoSomething(powerValues, (e) => e.lightbulbs, 3);
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
}
Extract the code your using twice into one method.
Create an enum for the values you want (e.g. PowerR, OtherField)
Add the enum as a parameter to your method
Add a switch statement in your method to the places where the code changes
This question already has answers here:
How can I generate random alphanumeric strings?
(36 answers)
Closed 2 years ago.
My ASP.NET application requires me to generate a huge number of random strings such that each contain at least 1 alphabetic and numeric character and should be alphanumeric on the whole.
For this my logic is to generate the code again if the random string is numeric:
public static string GenerateCode(int length)
{
if (length < 2 || length > 32)
{
throw new RSGException("Length cannot be less than 2 or greater than 32.");
}
string newcode = Guid.NewGuid().ToString("n").Substring(0, length).ToUpper();
return newcode;
}
public static string GenerateNonNumericCode(int length)
{
string newcode = string.Empty;
try
{
newcode = GenerateCode(length);
}
catch (Exception)
{
throw;
}
while (IsNumeric(newcode))
{
return GenerateNonNumericCode(length);
}
return newcode;
}
public static bool IsNumeric(string str)
{
bool isNumeric = false;
try
{
long number = Convert.ToInt64(str);
isNumeric = true;
}
catch (Exception)
{
isNumeric = false;
}
return isNumeric;
}
While debugging, it is working properly but when I ask it to create 10,000 random strings, its not able to handle it properly. When I export that data to Excel, I find at least 20 strings on an average that are numeric.
Is it a problem with my code or C#? - Mine.
If anyone's looking for code,
public static string GenerateCode(int length)
{
if (length < 2)
{
throw new A1Exception("Length cannot be less than 2.");
}
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var random = new Random();
var result = new string(
Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)])
.ToArray());
return result;
}
public static string GenerateAlphaNumericCode(int length)
{
string newcode = string.Empty;
try
{
newcode = GenerateCode(length);
while (!IsAlphaNumeric(newcode))
{
newcode = GenerateCode(length);
}
}
catch (Exception)
{
throw;
}
return newcode;
}
public static bool IsAlphaNumeric(string str)
{
bool isAlphaNumeric = false;
Regex reg = new Regex("[0-9A-Z]+");
isAlphaNumeric = reg.IsMatch(str);
return isAlphaNumeric;
}
Thanks to all for your ideas.
If you want to stick with the Guid as the generator, you could always validate using a Regex
This will only return true if at least one alpha is present
Regex reg = new Regex("[a-zA-Z]+");
Then just use the IsMatch method to see if your string is valid
That way you don't need the (IMHO rather ugly) try..catch around the Convert.
Update : I see your subsequent comment about actually making your code slower. Are you instantiating the Regex object only once, or every time that the test is being done? If the latter then this will be rather inefficient, and you should consider using a "lazy-loaded" property on your class, e.g.
private Regex reg;
private Regex AlphaRegex
{
get
{
if (reg == null) reg = new Regex("[a-zA-Z]+");
return reg;
}
}
Then just use AlphaRegex.IsMatch() in your method. I would expect this to make a difference.
use name space then using System.Linq; use normal string
check whether the string consist at lest one character or number.
using System.Linq;
string StrCheck = "abcd123";
check the string has characters ---> StrCheck.Any(char.IsLetter)
check the string has numbers ---> StrCheck.Any(char.IsDigit)
if (StrCheck.Any(char.IsLetter) && StrCheck.Any(char.IsDigit))
{
//statement goes here.....
}
sorry for the late reply ...
I didn't quite understand what you want in the string except letters (abc etc) - lets say numbers.
You can generate a random character as following:
Random r = new Random();
r.Next('a', 'z'); //For lowercase
r.Next('A', 'Z'); //For capitals
//or you can convert lowercase to capital:
char c = 'k' + ('A' - 'a');
If you want to create a string:
var s = new StringBuilder();
for(int i = 0; i < length; ++i)
s.Append((char)r.Next('a', 'z' + 1)); //Changed to char
return s.ToString();
Note: I don't know ASP.NET so much, so I just act like it's C#.
To answer your question strictly, using your existing code: there is a problem with your recursion logic, which can be avoided by not using recursion (there is absolutely no reason to use recursion in GenerateNonNumericCode). Do the following instead:
public static string GenerateNonNumericCode(int length)
{
string newcode = GenerateCode(length);
while (IsNumeric(newcode))
{
newcode = GenerateCode(length);
}
return newcode;
}
Other General Notes
Your code is very inefficient--throwing exceptions is expensive, so using try/catch in a loop is therefore slow and pointless. As others have suggested, regex makes more sense (System.Text.RegularExpressions namespace).
Is it a problem with my code or C#?
When in doubt, the problem is almost never C#.
So, I would change the code to this:
static Random r = new Random();
public static string GenerateNonNumericCodeFaster(int length) {
var firstLength = r.Next(0, length - 1);
var secondLength = length - 1 - firstLength;
return GenerateCode(firstLength)
+ (char) r.Next((int)'A', (int)'G')
+ GenerateCode(secondLength);
}
You can keep your GenerateCode function as is. Everything else you toss out. The idea here of course is, rather than testing if the string contains an alphabetic character, you just explicitly PUT one in. In my tests, using this code could generate 10,000 8 character strings in 0.0172963 seconds compared to your code which takes around 52 seconds. So, yeah, this is about 3000 times faster :)
I were asked to do an StringToInt / Int.parse function on the white board in an job interview last week and did not perform very good but I came up with some sort of solution. Later when back home I made one in Visual Studion and I wonder if there are any better solution than mine below.
Have not bothred with any more error handling except checking that the string only contains digits.
private int StrToInt(string tmpString)
{
int tmpResult = 0;
System.Text.Encoding ascii = System.Text.Encoding.ASCII;
byte[] tmpByte = ascii.GetBytes(tmpString);
for (int i = 0; i <= tmpString.Length-1; i++)
{
// Check whatever the Character is an valid digit
if (tmpByte[i] > 47 && tmpByte[i] <= 58)
// Here I'm using the lenght-1 of the string to set the power and multiply this to the value
tmpResult += (tmpByte[i] - 48) * ((int)Math.Pow(10, (tmpString.Length-i)-1));
else
throw new Exception("Non valid character in string");
}
return tmpResult;
}
I'll take a contrarian approach.
public int? ToInt(this string mightBeInt)
{
int convertedInt;
if (int.TryParse(mightBeInt, out convertedInt))
{
return convertedInt;
}
return null;
}
After being told that this wasn't the point of the question, I'd argue that the question tests C coding skills, not C#. I'd further argue that treating strings as arrays of characters is a very bad habit in .NET, because strings are unicode, and in any application that might be globalized, making any assumption at all about character representations will get you in trouble, sooner or later. Further, the framework already provides a conversion method, and it will be more efficient and reliable than anything a developer would toss off in such a hurry. It's always a bad idea to re-invent framework functionality.
Then I would point out that by writing an extension method, I've created a very useful extension to the string class, something that I would actually use in production code.
If that argument loses me the job, I probably wouldn't want to work there anyway.
EDIT: As a couple of people have pointed out, I missed the "out" keyword in TryParse. Fixed.
Converting to a byte array is unnecessary, because a string is already an array of chars. Also, magic numbers such as 48 should be avoided in favor of readable constants such as '0'. Here's how I'd do it:
int result = 0;
for (int i = str.Length - 1, factor = 1; i >= 0; i--, factor *= 10)
result += (str[i] - '0') * factor;
For each character (starting from the end), add its numeric value times the correct power of 10 to the result. The power of 10 is calculated by multiplying it with 10 repeatedly, instead of unnecessarily using Math.Pow.
I think your solution is reasonably ok, but instead of doing math.pow, I would do:
tmpResult = 10 * tmpResult + (tmpByte[i] - 48);
Also, check the length against the length of tmpByte rather than tmpString. Not that it normally should matter, but it is rather odd to loop over one array while checking the length of another.
And, you could replace the for loop with a foreach statement.
If you want a simple non-framework using implementation, how 'bout this:
"1234".Aggregate(0, (s,c)=> c-'0'+10*s)
...and a note that you'd better be sure that the string consists solely of decimal digits before using this method.
Alternately, use an int? as the aggregate value to deal with error handling:
"12x34".Aggregate((int?)0, (s,c)=> c>='0'&&c<='9' ? c-'0'+10*s : null)
...this time with the note that empty strings evaluate to 0, which may not be most appropriate behavior - and no range checking or negative numbers are supported; both of which aren't hard to add but require unpretty looking wordy code :-).
Obviously, in practice you'd just use the built-in parsing methods. I actually use the following extension method and a bunch of nearly identical siblings in real projects:
public static int? ParseAsInt32(this string s, NumberStyles style, IFormatProvider provider) {
int val;
if (int.TryParse(s, style, provider, out val)) return val;
else return null;
}
Though this could be expressed slightly shorter using the ternary ? : operator doing so would mean relying on side-effects within an expression, which isn't a boon to readability in my experience.
Just because i like Linq:
string t = "1234";
var result = t.Select((c, i) => (c - '0') * Math.Pow(10, t.Length - i - 1)).Sum();
I agree with Cyclon Cat, they probably want someone who will utilize existing functionality.
But I would write the method a little bit different.
public int? ToInt(this string mightBeInt)
{
int number = 0;
if (Int32.TryParse(mightBeInt, out number))
return number;
return null;
}
Int32.TryParse does not allow properties to be given as out parameter.
I was asked this question over 9000 times on interviews :) This version is capable of handling negative numbers and handles other conditions very well:
public static int ToInt(string s)
{
bool isNegative = false, gotAnyDigit = false;
int result = 0;
foreach (var ch in s ?? "")
{
if(ch == '-' && !(gotAnyDigit || isNegative))
{
isNegative = true;
}
else if(char.IsDigit(ch))
{
result = result*10 + (ch - '0');
gotAnyDigit = true;
}
else
{
throw new ArgumentException("Not a number");
}
}
if (!gotAnyDigit)
throw new ArgumentException("Not a number");
return isNegative ? -result : result;
}
and a couple of lazy tests:
[TestFixture]
public class Tests
{
[Test]
public void CommonCases()
{
foreach (var sample in new[]
{
new {e = 123, s = "123"},
new {e = 110, s = "000110"},
new {e = -011000, s = "-011000"},
new {e = 0, s = "0"},
new {e = 1, s = "1"},
new {e = -2, s = "-2"},
new {e = -12223, s = "-12223"},
new {e = int.MaxValue, s = int.MaxValue.ToString()},
new {e = int.MinValue, s = int.MinValue.ToString()}
})
{
Assert.AreEqual(sample.e, Impl.ToInt(sample.s));
}
}
[Test]
public void BadCases()
{
var samples = new[] { "1231a", null, "", "a", "-a", "-", "12-23", "--1" };
var errCount = 0;
foreach (var sample in samples)
{
try
{
Impl.ToInt(sample);
}
catch(ArgumentException)
{
errCount++;
}
}
Assert.AreEqual(samples.Length, errCount);
}
}