Convert text into PDU format

Convert text into PDU format - c#

I am developing a message server like thing which support PDU format (using android phone) to send messages. I have used online encoders to convert my text but i don't know the real steps to convert a text into PDU format I don't think it just a Hexadecimal number.
I used at commands for sending messages from hyper terminal.
Can someone help?
I used AT-Commands:
at
at+cmgf=0
at+cmgs=25 (Length i guess)
>"encoded message"

The AT+CMGS command is defined in the 3GPP 25.005 standard, and for PDU mode its syntax is given as
+CMGS=<length><CR>
PDU is given<ctrl-Z/ESC>
and in the description it is further specified
the PDU shall be hexadecimal format (similarly as specified for <pdu>)
and given in one line; ME/TA converts this coding into the actual
octets of PDU.
The <pdu> format is defined in Message Data Parameters in chapter 3.1 Parameter Definitions:
In the case of SMS: 3GPP TS 24.011 [6] SC address followed by
3GPP TS 23.040 [3] TPDU in hexadecimal format: ME/TA converts each
octet of TP data unit into two IRA character long hexadecimal number
(e.g. octet with integer value 42 is presented to TE as two characters
2A (IRA 50 and 65))
(SC is short for Service Centre)
And here all the fun begins because you now have to dig really, really, really deep into those other specifications to uncover the actual format...
For instance 24.011 describes low level data format for messages sent between the mobile and the network where only parts of it is relevant in this context.
7.3.1.2 RP‑DATA (Mobile Station to Network) This message is sent in MS ‑> MSC direction. The message is used to relay the TPDUs. The
information elements are in line with 3GPP TS 23.040.
and in the table given, the two last rows are the relevant parts, the Service Centre Address and the TPDU.
Information element, Reference, Presence, Format, Length
RP‑Message Type, Subclause 8.2.2, M, V, 3 bits
RP‑Message Reference, Subclause 8.2.3, M, V, 1 octet
RP‑Originator Address, Subclause 8.2.5.1, M, LV, 1 octet
RP‑Destination Address, Subclause 8.2.5.2, M, LV, 1‑12 octets
RP‑User Data, Subclause 8.2.5.3, M, LV, <= 233 octets
Trying to dig further I got stuck on trying to figure out the value of the RP‑Destination Address number IEI, and I have already spent a long time writing this answer. Sorry for stopping here. The actual phone number encoding is the "normal" Called party BCD number encoding (10.5.4.7 in 24.008) and TON+NPI is the same as the <type> argument in AT+CPBW for instance. And encoding of the text is a whole story on it own...
Trying to decipher parts of the 3GPP specifications can sometimes be really hard and the possibilities for misinterpretation might be close to endless! If you are really set on developing your own code for doing this you are probably better off by starting to read good PDU mode introductions like
http://mobiletidings.com/2009/02/11/more-on-the-sms-pdu/1.
Or look up the code in an already existing library/program that handles PDU mode2.
1
Notice that good quality articles like that are far between, if the text does not include references to detailed/technical terms from the 3GPP standards that is usually a low quality indicator.
2
Again, look hard for good quality.

Related

How to send a long(160 or more characters) text message?

Everyone
I am trying to write a code in c# where I could send a text message that has 160 or more characters using the GSMComm libray.
What I've done is I divided my message into parts/message and send them to my clients. The problem is, the clients find it irritating.
So, is there a way to send a long text message?
*Update
I found this on their website:
Q: How can I send long (concatenated) text messages?
A: GSMComm implements a part of the "Smart Messaging" standard defined by Nokia. The methods for it are implemented in the GsmComm.PduConverter.SmartMessaging.SmartMessageFactory class. It supports creating long messages for standard SMS text as well as for Unicode messages (Built-in Unicode conversion starts with Version 1.61).
But I can't find their documentation so i don't know how to use the SmartMessaging.

From here:
If using HTTP, you gotta set the MLC to 2. Message length control.
Determines system behavior when the message length exceeds limits set
by the mobile operator. 0 – Reject the MT if message text > maximum
allowed for the target operator. 1 – Truncate the MT if message text >
maximum allowed for the target operator. 2 – Automatically create
multiple MTs dividing the message text at the point(s) where message
text length = maximum allowed for the target operator.

Identifying repeating sequences of data in byte array

Given a sample of hexadecimal data, I would like to identify UNKNOWN sequences of bytes that are repeated throughout the sample. (Not searching for a known string or value) I am attempting to reverse engineer a network protocol, and I am working on determining data structures within the packet. As an example of what I'm trying to do (albeit on a smaller scale):
(af:b6:ea:3d:83:02:00:00):{21:03:00:00}:[b3:49:96:23:01]
{21:03:00:00}:(af:b6:ea:3d:83:02:00:00):01:42:00:00:00:00:01:57
And
(38:64:88:6e:83:02:00:00):{26:03:00:00}:[b3:49:96:23:01]
{26:03:00:00}:(38:64:88:6e:83:02:00:00):01:42:00:00:00:00:00:01
Obviously, these are easy to spot by eye, but patterns that are hundreds of chars into the data are not. I'm not expecting a magic bullet for the solution, just a nudge in the right direction, or even better, a premade tool.
I'm currently needing this for a C# project, but I am open to any and all tools.

If you have no idea what you are looking for, you could get an idea of the layout of the data by performing a negative entropy analysis on a reasonably large enough sample of conversations to see the length of the records/sub-records.
If the data is structured with repeated sequences of roughly the same length and content type you should see clusters of values with nearly the same negative entropy around the length of the record and sub records.
For example if you put a basic file with a lot of the same data through that, you should see values around the average record length with comparable negentropies (ex: if you use a CSV file with an average line length of 117 bytes, you might see 115, 116, 117 & 119 with the highest negentropy), and values around the most common field lengths with the same negentropy.
You might do a byte occurence scan, to see which byte values are likely separators.
There is a free hex editor with sources which does that for you (hexplorer, in the Crypto/Find Pattern menu). You may have to change the default font through Options to actually something in the UI.

C# best way to differentiate socket messages

Im new to sockets, and Im creating a tictactoe online, I know how to make the connections with the clients and the server, but I will make a chat too.
Then I doing this, when a user chat I send a message with a prefix "CHAT: HELLO WORLD"
and when a user make a move I send a message without the prefix... this is the best way?
THX!!!

In defining a wire protocol over a stream-based protocol like TCP, you have a few options for constructing messages:
Fixed-length
All messages are the same length; every sequence of x bytes represents a new message.
Length-prefixed (variable length)
The first byte(s) of the message represent the length of the payload to follow.
String-terminated (variable length)
Read bytes from the stream until you come to a specified byte-string that represents the end of a message, i.e. the newline character \n.
If you ever intend on changing the protocol (protip: you will, even if you don't think you will), it is crucial that you include an identifier for the protocol version in each message to prevent issues when dealing with clients using an older iteration of the protocol. Clearly, this is the first thing you must determine before deciphering the rest of the payload, so this should be the first byte(s) of the message (following any length-prefix) - how could we determine the version if we don't know where it is located in every message we receive?

Typically you would go with a format that includes a packet length, type and payload.
In your case you could go with a Byte (type), Int16 (length), Byte[] (payload).
The type can be represented in code as an enum. Length would just represent the length of the payload.
public enum Byte PacketType {
PlayerMove = 1,
PlayerChat = 2
}

You need to define a protocol. Remember to allow room for additional features :-).
Eg. using regular expressions over complete lines (end with selected line terminator):
Matching ^:[a-c][1-3]:: is a move (colon, position, colon user name).
Matching ^!.*?:: is a chat message (exclamation point, name, colon, text).
and anything else (in V1) is an error.
Remember:
Data is sent in packets, you might need multiple reads from the socket to get a complete message.
Avoid ambiguity: resolving it might be x or y is hard.
Specify a text encoding (eg. UTF-8).

I assume you're using TCP?
You need to make sure you 'frame' both messages so you can identify them and also avoid potential blocking issues (in case the client stops sending while you are still expecting to read CHAT: or whatever you define). With TCP your byte order is guaranteed but reading does not guarantee a complete 'packet' so you'll need to implement some way of building up a buffer and identifying when your 'message' is complete.
A reasonably simple way of doing this is to make sure each 'message' has a header with the type and size specified.
EG:
Enumerate your message types (move and chat currently), so say 'chat' is 0x01 and your message is 1020 bytes. You can prefix your 'message' with 0x0103FC so the server knows how many bytes to expect, and build up a buffer using async socket calls until the 1020 bytes are read (or you arbitrarily decide that the client is not sending anymore)

Ascii range regards binary files?

ive been reading about this topic and didnt get the specific info for my question :
(maybe the following is incorrect - but please do correct me)
Every file( text/binary) is saving BYTES.
byte is 8 bits hence max value is 2^8-1 = 255 codes.
those 255 codes divides to 2 groups:
0..127 : textual chars
128:..255 : special chars.
so binary file contains char codes from the whole range : 0..255 ( ascii chars+special chars).
1 ) correct ?
2) NOw , lets say im saving one INT in binary file. ( 4 byte in 32 bit system)
how does the file tells the progem reads it : its not 4 single unrelated bytes but an int which is 4 bytes ?

Underlying all files are being stored as bytes, so in a sense what you're saying is correct. However, if you open a file that's intended to be read as binary and try to read it in a text editor, it will look like gibberish.
How does a program know whether to read a file as text or as binary? (ie as special sets of ASCII or other encoded bytes, or just as the underlying bytes with a different representation)?
Well, it doesn't know - it just does what it's told.
In Windows, you open .txt files in notepad - notepad expects to be reading text. Try opening a binary file in notepad. It will open, you will see stuff, but it will be rubbish.
If you're writing your own program you can write using BinaryWriter and read using BinaryReader if you want to store everything as binary. What would happen if you wrote using BinaryWriter and read using StringReader?
To answer your specific example:
using (var test = new BinaryWriter(new FileStream(#"c:\test.bin", FileMode.Create)))
{
test.Write(10);
test.Write("hello world");
}
using (var test = new BinaryReader(new FileStream(#"c:\test.bin", FileMode.Open)))
{
var out1 = test.ReadInt32();
var out2 = test.ReadString();
Console.WriteLine("{0} {1}", out1, out2);
}
See how you have to read in the same order that's written? The file doesn't tell you anything.
Now switch the second part around:
using (var test = new BinaryReader(new FileStream(#"c:\test.bin", FileMode.Open)))
{
var out1 = test.ReadString();
var out2 = test.ReadInt32();
Console.WriteLine("{0} {1}", out1, out2);
}
You'll get gibberish out (if it works at all). Yet there is nothing you can read in the file that will tell you that beforehand. There is no special information there. The program must know what to do based on some out of band information (a specification of some sort).

so binary file contains char codes from the whole range : 0..255 ( ascii chars+special chars).
No, a binary file just contains bytes. Values between 0 and 255. They should only be considered as character at all if you decide to ascribe that meaning to them. If it's a binary file (e.g. a JPEG) then you shouldn't do that - a byte 65 in image data isn't logically an 'A' - it's whatever byte 65 means at that point in the file.
(Note that even text files aren't divided into "ASCII characters" and "special characters" - it depends on the encoding. In UTF-16, each code unit takes two bytes regardless of its value. In UTF-8 the number of bytes depends on the character you're trying to represent.)
how does the file tells the progem reads it : its not 4 single unrelated bytes but an int which is 4 bytes ?
The file doesn't tell the program. The program has to know how to read the file. If you ask Notepad to open a JPEG file, it won't show you an image - it will show you gibberish. Likewise if you try to force an image viewer to open a text file as if it were a JPEG, it will complain that it's broken.
Programs reading data need to understand the structure of the data they're going to read - they have to know what to expect. In some cases the format is quite flexible, like XML: there are well-specified layers, but then the program reads the values with higher-level meaning - elements, attributes etc. In other cases, the format is absolutely precise: first you'll start with a 4 byte integer, then two 2-byte integers or whatever. It depends on the format.
EDIT: To answer your specific (repeated) comment:
Im Cmd shell....youve written your binary file. I have no clue what did you do there. how am i suppose to know whether to read 4 single bytes or 4 bytes as once ?
Either the program reading the data needs to know the meaning of the data or it doesn't. If it's just copying the file from one place to another, it doesn't need to know the meaning of the data. It doesn't matter whether it copies it one byte at a time or all four bytes at once.
If it does need to know the meaning of the data, then just knowing that it's a four byte integer doesn't really help much - it would need to know what that integer meant to do anything useful with it. So your file written from the command shell... what does it mean? If I don't know what it means, what does it matter whether I know to read one byte at a time or four bytes as an integer?
(As I mentioned above, there's an intermediate option where code can understand structure without meaning, and expose that structure to other code which then imposes meaning - XML is a classic example of that.)

It's all a matter of interpretation. Neither the file nor the system know what's going on in your file, they just see your storage as a sequence of bytes that has absolutely no meaning in itself. The same thing happens in your brain when you read a word (you attempt to choose a language to interpret it in, to give the sequence of characters a meaning).
It is the responsibility of your program to interpret the data the way you want it, as there is no single valid interpretation. For example, the sequence of bytes 48 65 6C 6C 6F 20 53 6F 6F 68 6A 75 6E can be interpreted as:
A string (Hello Soohjun)
A sequence of 12 one-byte characters (H, e, l, l, o, , S, o, o, h, j, u, n)
A sequence of 3 unsigned ints followed by a character (1214606444, 1864389487, 1869113973, 110)
A character followed by a float followed by an unsigned int followed by a float (72, 6.977992E22, 542338927, 4.4287998E24), and so on...
You are the one choosing the meaning of those bytes, another program would make a different interpretation of the very same data, much the same a combination of letters has a different interpretation in say, English and French.
PS: By the way, that's the goal of reverse engineering file formats: find the meaning of each byte.

Compress small string

Maybe there are any way to compress small strings(86 chars) to something smaller?
#a#1\s\215\c\6\-0.55955,-0.766462,0.315342\s\1\x\-3421.-4006,3519.-4994,3847.1744,sbs
The only way I see is to replace the recurring characters on a unique character.
But i can't find something about that in google.
Thanks for any reply.

http://en.wikipedia.org/wiki/Huffman_coding
Huffman coding would probably be pretty good start. In general the idea is to replace individual characters with the smallest bit pattern needed to replicate the original string or dataset.
You'll want to run statistical analysis on a variety of 'small strings' to find the most common characters so that the more common characters will be represented with the smallest unique bit patterns. And possibly makeup a 'example' small string with every character that will need to be represented (like a-z0-9#.0-)

I took your example string of 85 bytes (not 83 since it was copied verbatim from the post, perhaps with some intended escapes not processed). I compressed it using raw deflate, i.e. no zlib or gzip headers and trailers, and it compressed to 69 bytes. This was done mostly by Huffman coding, though also with four three-byte backward string references.
The best way to compress this sort of thing is to use everything you know about the data. There appears to be some structure to it and there are numbers coded in it. You could develop a representation of the expected data that is shorter. You can encode it as a stream of bits, and the first bit could indicate that what follows is straight bytes in the case that the data you got was not what was expected.
Another approach would be to take advantage of previous messages. If this message is one of a stream of messages, and they all look similar to each other, then you can make a dictionary of previous messages to use as a basis for compression, which can be reconstructed at the other end by the previous messages received. That may offer dramatically improved compression if they messages really are similar.

You should look up RUN-LENGTH ENCODING. Here is a demonstration
rrrrrunnnnnn BECOMES 5r1u6n WHAT? truncate repetitions: for x consecutive r use xr
Now what if some of the characters are digits? Then instead of using x, use the character whose ASCII value is x. for example,
if you have 43 consecutive P, write +P because '+' has ASCII code 43. If you have 49 consecutive y, write 1y because '1' has ASCII code 49.
Now the catch, which you will find with all compression algorithms, is if you have a string with little or no repetitions. Then in that case your code may be longer than the original word. But that's true for all compression algorithms.
NOTE:
I don't encourage using Huffman coding because even if you use the Ziv-Lempel implementation, it's still a lot of work to get it right.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.