unidentified char preventing me from parsing coordinates into an int - c#

Ok I'll try and explain the problem although it's going to be a bit hard.
I'm trying to parse some information from a certain page containing coordinates.
and the copy paste give you something like this:
Distance Position
5.8 ‎‭(‭‭77‬‬|‭-‭2‬‬)‬‎
6.3 ‎‭(‭‭76‬‬|‭-‭1‬‬)‬‎
7.8 ‎‭(‭‭76‬‬|‭‭6‬‬)‬‎
9.2 ‎‭(‭‭91‬‬|‭‭3‬‬)‬‎
9.5 ‎‭(‭‭79‬‬|‭‭10‬‬)‬‎
12.2 ‎‭(‭‭80‬‬|‭‭13‬‬)‬‎
15 ‎‭(‭‭82‬‬|‭-‭14‬‬)‬‎
15 ‎‭(‭‭81‬‬|‭‭16‬‬)‬‎
now the problem that I have is that between the "(" and the number there is an unidentified char, if you press on the right arrow key it won't move but if you press few times then it will move.
I haven't encounter this thing anywhere, and the website is in php if that helps.
also if that helps when I copy paste the information in here the char disappear and I can move freely through the text.
Please help me with this problem since it's causing my software to malfunction since I'm trying to parse the coordinates into an int and because of that char it won't let me, it'll give me a format exception.

While viewing in UTF-8, I see nothing, while changing the encoding to ANSI, I am left with:
5.8 ‎‭(‭‭77‬‬|‭-‭2‬‬)‬‎
6.3 ‎‭(‭‭76‬‬|‭-‭1‬‬)‬‎
7.8 ‎‭(‭‭76‬‬|‭‭6‬‬)‬‎
9.2 ‎‭(‭‭91‬‬|‭‭3‬‬)‬‎
9.5 ‎‭(‭‭79‬‬|‭‭10‬‬)‬‎
12.2 ‎‭(‭‭80‬‬|‭‭13‬‬)‬‎
15 ‎‭(‭‭82‬‬|‭-‭14‬‬)‬‎
15 ‎‭(‭‭81‬‬|‭‭16‬‬)‬‎
You seem to have used the Left-to-right mark (‎‭), and the encoding was swapped once or twice.
You could clean it, because it's from a website. My first guess would be that your browser settings are not correct (wrong encoding set).
You can still try cleaning it.
Code:
Regex rgx = new Regex("[^a-zA-Z0-9_\n %\[\]\.\(\)%&-]");
data = rgx.Replace(data, "");

Related

How to efficiently store Huffman Tree and Encoded binary string into a file?

I can easily convert a character string into a Huffman-Tree then encode into a binary sequence.
How should I save these to be able to actually compress the original data and then recover back?
I searched the web but I only could find guides and answers showing until what I already did. How can I use huffman algorithm further to actually achieve lossless compression?
I am using C# for this project.
EDIT: I've achieved these so far, might need rethinking.
I am attempting to compress a text file. I use Huffman Algorithm but there are some key points I couldn't figure out:
"aaaabbbccdef" when compressed gives this encoding
Key = a, Value = 11
Key = b, Value = 01
Key = c, Value = 101
Key = d, Value = 000
Key = e, Value = 001
Key = f, Value = 100
11111111010101101101000001100 is the encoded version. It normally needs 12*8 bits but we've compressed it to be 29 bits. This example might be a litte unnecessary for a file this small but let me explain what I tried to do.
We have 29 bits here but we need 8*n bits so I fill the encodedString with zeros until it becomes a multiple of eight. Since I can add 1 to 7 zeros it is more than enough to use 1-byte to represent this. This case I've added 3 zeros
11111111010101101101000001100000 Then add as binary how many extra bits I've added to the front and the split into 8-bit pieces
00000011-11111111-01010110-11010000-01100000
Turn these into ASCII characters
ÿVÐ`
Now if I have the encoding table I can look to the first 8bits convert that to integer ignoreBits and by ignoring the last ignoreBits turn it back to the original form.
The problem is I also want to include uncompressed version of encoding table with this file to have a fully functional ZIP/UNZIP prpgram but I am having trouble deciding when my ignoreBits ends, my encodingTable startse/ends, encoded bits start/end.
I thought about using null character but there is no assurance that Values cannot produce a null character. "ddd" in this situation produces 00000000-0.....
Your representation of the code needs to be self-terminating. Then you know the next bit is the start of the Huffman codes. One way is to traverse the tree that resulted from the Huffman code, writing a 0 bit for each branch, or a 1 bit followed by the symbol for leaf. When the traverse is done, you know the next bit must be the codes.
You also need to make your data self terminating. Note that in the example you give, the added three zero bits will be decoded as another 'd'. So you will incorrectly get 'aaaabbbccdefd' as the result. You need to either precede the encoded data with a count of symbols expected, or you need to add a symbol to your encoded set, with frequency 1, that marks the end of the data.

Console.Write makes Sound

today I came across something that actually scared me when it happened.
I was reading out a file as an Byte-Array and printed each byte out converted as a char like the following:
byte[] bytes = System.IO.File.ReadAllBytes(fileName);
foreach(byte bt in bytes)
{
Console.Write((char)bt + " ");
}
The thing is now, that printing the converted values out to Console actually made a sound in my headset and my general audio-output..
When I then clicked into the console to stop the execution, after a few seconds there was a Windows-Notification-Sound like when you get an update or something like that.
My question now is why this is happening?
Also note that I tested the File.ReadAllBytes using a mp4-file first and then with a .zip. With a plain .txt-file it doesnt seem to work.
Also I am using Windows 10.
Thanks to the comments I was able to figure out that a beeping-character was actually called out, which caused Windows 10 to do basically infinite beeping-sounds.
I checked for the Hex-Value of 0x07 now before emitting the sound, and it turned out that, after setting a breakpoint, it actually is in the byte-array and when printed it made the sound.
Thanks everyone, I am not cursed after all ;) :)
PS:
I used the german Wiki-Page to get the hex-value:
https://de.wikipedia.org/wiki/Steuerzeichen
On the English one I couldnt find it

Microsoft.CognitiveServices.Speech.SpeechRecognizer-getting time offsets of results in a file with continuous recognition

I'm testing out the new unified speech engine on Azure, and I'm working on a piece where I'm trying to transcribe a 10 minute audio file. I've created a recognizer with CreateSpeechRecognizerWithFileInput, and I've kicked off continuous recognition with StartContinuousRecognitionAsync. I created the recognizer with detailed results enabled.
In the FinalResultsReceived event, there doesn't seem to be a way to access the audio offset in the SpeechRecognitionResult. If I do this though:
string rawResult = ea.Result.ToString(); //can get access to raw value this way.
Regex r=new Regex(#".*Offset"":(\d*),.*");
int offset=Convert.ToInt32(r?.Match(rawResult)?.Groups[1]?.Value);
Then I can extract the offset. The raw result looks something like this:
ResultId:4116b361141446a98f306fdc11c3a5bd Status:Recognized Recognized text:<OK, so what's your think it went well, let's look at number number is 104-828-1198.>. Json:{"Duration":129500000,"NBest":[{"Confidence":0.887861133,"Display":"OK, so what's your think it went well, let's look at number number is 104-828-1198.","ITN":"OK so what's your think it went well let's look at number number is 104-828-1198","Lexical":"OK so what's your think it went well let's look at number number is one zero four eight two eight one one nine eight","MaskedITN":"OK so what's your think it went well let's look at number number is 104-828-1198"}],"Offset":6900000,"RecognitionStatus":"Success"}
The challenge there is that the Offset is sometimes zero, even for cases where it's a nonzero file index, so I'll get zeroes in the middle of a recognition stream.
I also tried submitting the same file through the batch transcription API, which gives me a different result entirely:
{
"RecognitionStatus": "Success",
"Offset": 531700000,
"Duration": 91300000,
"NBest": [{
"Confidence": 0.87579143,
"Lexical": "OK so what's your think it went well let's look at number number is one zero four eight two eight one",
"ITN": "OK so what's your think it went well let's look at number number is 1048281",
"MaskedITN": "OK so what's your think it went well let's look at number number is 1048281",
"Display": "OK, so what's your think it went well, let's look at number number is 1048281."
}
]
},
So I have three questions on this:
Is there a supported method to get the offset of a recognized section of a file in the recognizer API? The SpeechRecognitionResult doesn't expose this, nor does the Best() extension.
Why is the offset coming back as 0 for a segment part way through the file?
What are the units for the offsets in the bulk recognition and file recognition APIs, and why are they different? They don't appear to be ms or frames, at least from what I've found in Audacity. The result I posted was from roughly 59s into the file, which is roughly 800k samples.
Chris,
Thanks for your feedback. To your questions,
1) The offset as well as duration have been added to the API. The next coming release (very soon) will allow you access both properties. Please stay tuned.
2) This is probably due to different recognition mode being used. We will also fix that in the next release.
3) The time unit for both API is 100ns(tick). Please also note that batch transcription uses different model than online recognition, so that the recognition result might be slightly different.
Sorry for the inconvenience!
Thanks,

AI movement 100% like in the Game MAFIA on the C64

Update
Link to the C# Solution Project from a user who answered me:
https://github.com/TedDawkin/C64_AI_Movement
Update
I found it - if someone is interested just use the link to the thread in C64 Forum: http://www.lemon64.com/forum/viewtopic.php?p=712902#712902 where I discussed the topic (with myself). Funny how simple it was in the end. Not so funny that it took me over 2 weeks.
I will post the BASIC to C# code later and "answer" my own Question.
If someone cares and dont want to go to the forum link, here is how it works in C64 basic.
30110 IF KS(S)=0 THEN GOSUB 30400 : GOTO30105
30400 POKE 211,20: POKE214,18: SYS CS: PRINT"SPIELER"F
30405 Y=INT(KP(S,F)/40)
SYS CR, KP(S,F)-40*Y, Y
X=PEEK(UA)-1
Y=PEEK(UA+1)-40
30410 IF PEEK(UA+2)=1 OR PEEK (UA+3)=1 GOTO 30420
30415 IFINT(RND(1)*2)=0ORGW(KS(S),F)<4GOTO30450
30420 IFX=0THENX=Y:GOTO30215
30421 IFY<>0GOTO30450
30425 GOTO30215
30450 IFX<>0ANDRI(F)<>(1+2*(X=1))THENP=X:GOSUB30490:IFP=0THENRETURN
30451 IFY<>0ANDRI(F)<>(40+80*(Y=1))THENP=Y:GOSUB30490:IFP=0THENRETURN
30455 IFX<>0ANDRI(F)<>(1+2*(X=1))GOTO30460
30456 IFRI(F)<>-1THENP=1:GOSUB30490:IFP=0THENRETURN
30457 IFRI(F)<>1THENP=-1:GOSUB30490:IFP=0THENRETURN
30458 GOTO30465
30460 IFY<>0ANDRI(F)<>(40+80*(Y=1))GOTO30465
30461 IFRI(F)<>-40THENP=40:GOSUB30490:IFP=0THENRETURN
30462 IFRI(F)<>40THENP=-40:GOSUB30490:IFP=0THENRETURN
30465 RETURN
30490 Q=KP(S,F)+P:IFQ<0ORQ>520OR(PEEK(BR+Q)<>32ANDPEEK(BR+Q
<>96)THENRETURN
30491 POKEBR+KP(S,F),32:KP(S,F)=Q:POKEBR+KP(S,F),193:POKEFR+KP(S,F),6
30492 RI(F)=P:P=0:RETURN
some hints:
X = moved right (1), left (-1), up (-40) or down(40)
P = Position. There is no Y because the next/prev line is
40 characters away. (C64 Screen = 40 Columns and 25 Rows)
S = switches between 0 and 2 to determine if its human or ai turn
KP(S,F) = Offset-Position in Video-Memory-Adress
BR = Start-Adress of Video-Memory
32 = 0x20 = Space to clear old position
193 = Ascii-Character used as pawn for player and ai
6 = Mark field as AI (Human position is marked with 0x0b)
F = Playernumber
RI(F) = dont know yet
Original Question:
Working on a remake and trying to understand C64-Code I am struggling with the AI Movement. Working over a week on it, I cant reproduce the 100% same behavior.
To answer my question: Yes, an Ai can act randomly by using RND and I totally overlooked that.
For the in-depth discussion go to the Forum-Thread
There where other stumbling blocks that will maybe help if you want to debug/reverse engineer some stuff.
I always read the BASIC-Code without syntax highlighting and formatting. By doing this I missed some things and made errors.
Positions where marked in screen memory
Positions where not Point(x,y) or array[x,y] but an integer 0-999 representing the 25 rows and 40 columns of the C64-Screen
With the assembler part, I was to much in detail. It would have been better to look at 5 to 10 commands in a row and think about what could happend (without debugging) instead of getting lost in JSRs while debugging

Incorrect Lambda Expression Indentation

I've been having this problem for awhile in Visual Studio 2013. It doesn't seem to understand how to apply the indentation rules properly to lambda expressions when they've been lined up incorrectly. Here is a simplified example:
var s = new Action(() =>
{
});
In the second and third row, the indent is only 3 spaces instead of 4 (the real code example is much, much larger with the inner expression spanning hundreds of lines - this was checked in by my colleague and I'm trying to fix it). I've tried every combination of reformat code, document, re-creating the curly brace, etc. Nothing seems to work. It refuses to automatically update the indentation properly.
I normally wouldn't bother with it, but it causes all the code inside to be off by 1 character as well. When I'm typing lines in the middle, the tab/shift+tab markers are 1 character off from the lines above and below and I constantly have to adjust to get things lined up again. The closest thing I can find to reference this issue is this Connect Feedback from 2013 that is supposedly fixed, but I'm on Update 4 (released Nov 2014) and still experiencing the issue.
Short of manually going through and updating the indentation for every line in the lambda expression, does anyone have an idea how I can quickly fix this code?
Blatantly ignoring the issue in Visual Studio, and providing a solution to the problem right away. Hold alt to enable block selection, select all lines, and type a single space. Just to illustrate:
If you type Hello World!, the result would be:
As a 'rant': a single lambda should not contain hundreds of lines of code, it is a very big nono maintainability wise.

Categories