1

I am using boost::beast to read data from a websocket into a std::string. I am closely following the example websocket_sync_client.cpp in boost 1.71.0, with one change--the I/O is sent in binary, there is no text handler at the server end, only a binary stream. Hence, in the example, I added one line of code:

    // Make the stream binary?? https://github.com/boostorg/beast/issues/1045
    ws.binary(true);

Everything works as expected, I 'send' a message, then 'read' the response to my sent message into a std::string using boost::beast::buffers_to_string:

        // =============================================================
        // This buffer will hold the incoming message
        beast::flat_buffer wbuffer;

        // Read a message into our buffer
        ws.read(wbuffer);
        // =============================================================

        // ==flat_buffer to std::string=================================
        string rcvdS = beast::buffers_to_string(wbuffer.data());
        std::cout << "<string_rcvdS>" << rcvdS << "</string_rcvdS>" << std::endl;
        // ==flat_buffer to std::string=================================

This just about works as I expected, except there is some kind of escaping happening on the data of the (binary) stream. There is no doubt some layer of boost logic (perhaps character traits?) that has enabled/caused all non-printable characters to be '\u????' escaped, human-readable text.

The binary data that is read contains many (intentional) non-printable ASCII control characters to delimit/organize chunks of data in the message:

I would rather not have the stream escaping these non-printable characters, since I will have to "undo" that effort anyway, if I cannot coerce the 'read' buffer into leaving the data as-is, raw. If I have to find another boost API to undo the escaping, that is just wasted processing that no doubt is detrimental to performance.

My question has to have a simple solution. How can I cause the resulting flat_buffer that is ws.read into 'rcvdS' to contain truely raw, unescaped bytes of data? Is it possible, or is it necessary for me to simply choose a different buffer template/class, so that the escaping does not happen?

Here is a visual aid - showing expected vs. actual data: Here is a visual aid:

0

2 Answers 2

1

Beast does not alter the contents of the message in any way. The only thing that binary() and text() do is set a flag in the message which the other end receives. Text messages are validated against the legal character set, while binary messages are not. Message data is never changed by Beast. buffers_to_string just transfers the bytes in the buffer to a std::string, it does not escape anything. So if the buffer contains a null, or lets say a ctrl+A, you will get a 0x00 and a 0x01 in the std::string respectively.

If your message is being encoded or translated, it isn't Beast that is doing it. Perhaps it is a consequence of writing the raw bytes to the std::cout? Or it could be whatever you are using to display those messages in the image you posted. I note that the code you provided does not match the image.

Sign up to request clarification or add additional context in comments.

6 Comments

The image I posted is how Notepad++ renders ASCII non-printable characters as black "ASCII codes". I.e., if the data contains the form feed ACII character (1 byte), then Notepad displays a black "FF" icon. That is the best way that I could visually display the non-printable ASCIIs in the data. When I step into all the code that is hit by "ws.read(wbuffer);" with Visual Studio, I step through 'read_some' in beast\websocket\impl\read.hpp, which eventually steps through 'impl.rd_utf8.write', and 'impl.rd_utf8.finish()' and I believe it is there that causes it... but I'm still digging. :-|
rd_utf8 is simply a UTF-8 validator, it does not modify anything. That is the difference between text and binary: text messages are checked for valid utf-8, while binary messages are not. Otherwise they are identical. Beast does not change the message contents!
How did you get this data into Notepad++?
Sorry for the delay, I am working on a C++ proof of concept using Boost.Beast, that will replace the network communication class in a native C++ Console application. So, currently, it is receiving all of the binary communication from a backend server's Java Websocket connection. I captured the first 6 response messages that arrive in the current client from the server. I captured them, first, by creating a file and writing the binary data to the file. I also captured them using Visual Studio while debugging the client. I will email you a TXT file containing the first 6 (binary) messages.
@"Vinnie Falco" to get the data into Notepad++ so that you can "see" the non-printable characters, I downloaded Unifont, and changed Notepad++ to use Unifont. What I like about that is that this font changes the appearance of non-printable ASCII characters into the ASCII names that represent them, and they appear in inverse. This makes it very easy to "see" exactly what each non-printable ASCII character is in your data.
|
0

If anyone else lands here, rest assured, it is your server end, not the client end that is escaping your data.

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.