0

I am writing a library for CubeMars motors that use the CAN communication protocol. The library uses 2 threads: the main thread does almost everything, while the second thread is a listener that listens to the CAN bus (= to the motors' answers and confirmations) and saves those information. The main thread can later access that info.

Typically, the main thread writes a command to the CAN bus. Motors send a confirmation packet that is seen and saved by the listener thread. The main thread waits until the confirmation is received before continuing.

This all usually takes around 500 us. However, in about 1% of cases (that happen at random times), this time jumps up to about 10 ms, messing up the control loop. The problem is not coming from the motors themselves.

Used setup:

  • CubeMars motor: AK60-6 v.1.1 in MIT mode
  • Raspberry 4B with Arch Linux ARM
  • MCP2515 CAN board
  • Communication via CAN, 1M baudrate
  • Code written in C++

Here is the code creating the socket:

int MotorHandler::openSocket(const char* can_bus)
{
    // Create a CAN socket
    int s = socket(PF_CAN, SOCK_RAW, CAN_RAW);
    if (s < 0) {
        cout << "Error creating the socket! Exiting" << endl;
        exit(1);
    }
    else   
        cout << "Socket created successfully" << endl;

    // Set socket timeout
    struct timeval tv;
    tv.tv_sec = 0;
    tv.tv_usec = SOCKET_TIMEOUT_US;
    int success = setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, (const char*)&tv, sizeof(tv));
    if (success < 0) {
        cout << "Error setting the timeout to the CAN socket" << endl;
        exit(1);
    }

    // Bind the socket to the CAN bus
    struct sockaddr_can addr;
    struct ifreq ifr;
    strcpy(ifr.ifr_name, can_bus);
    ioctl(s, SIOCGIFINDEX, &ifr);

    memset(&addr, 0, sizeof(addr));
    addr.can_family = AF_CAN;
    addr.can_ifindex = ifr.ifr_ifindex;

    int result = bind(s, (struct sockaddr *)&addr, sizeof(addr));
    if (result < 0) {
        cout << "Binding error! Exiting" << endl;
        exit(1);
    }
    else
        cout << "Binding successful" << endl;

    return s;
}

This is the function executed by the listener thread:

int Listener::listenerLoop(int s)
{
    bool stopThread = 0;

    cout << "Starting CAN bus monitoring" << endl;
    while(!stopThread) {

        struct can_frame frame;
        int nbytes = read(s, &frame, sizeof(can_frame));
        if (nbytes > 0)
            parseFrame(frame);

        // Thread sleep for scheduling
        std::this_thread::sleep_for(chrono::microseconds(10));
        {
            scoped_lock lock(m_mutex);
            stopThread = m_stopThread;  
        }
    }

    return 0;
}

What I tried

I was at first using a non-blocking socket.

  1. To make debugging easier, I created a test where I controlled only 1 motor in a single thread, using a non-blocking socket: I sent a command and then listened to the CAN bus waiting for the motor's confirmation. The behavior was the same problematic one, with random delays occurring. When I swapped the socket to blocking mode, everything worked perfectly, with no delays happening.

  2. In the library with the 2 threads, I also set the socket to blocking mode, but it did not change anything: random delays keep happening.

This leads me to think the problem might be in buffering or scheduling, since the issue only arises when using 2 threads. I tried various timers for thread sleeping, but to no avail (it is currently set to 10 us).

I frankly do not understand why that same problem was happening when using a non-blocking socket in a single thread, but disappeared when it got swapped to blocking mode. The reason might be the key to fixing the issue when using 2 threads.

Any help with this would be greatly appreciated.

4
  • Shouldn't the whole ifr struct be initialized? You only init one member, the rest of them are indeterminate. Same thing with tv. That's all quite fishy code. Commented Dec 2, 2024 at 14:02
  • In the future please make sure to tag your question with C++, at least for formatting purposes, since it contains C++ code, as well as Embedded, to reach the right audience who are most likely to give a good answer. I only stumbled upon this question since I watch the CAN tag, which not many people do. Commented Dec 2, 2024 at 14:08
  • Linux is not an RTOS. Random scheduling delays are a given. If the system were running just those two threads, you'd wonder, but it is Linux, and likely has several tens of processes running just to boot up. You may have better luck using a kernel built with PREEMPT_RT patches, but I wouldn't rely on that if you need hard real-time response. Commented Dec 14, 2024 at 8:55
  • ... I note that since kernel v6.12 PREEMPT_RT is the mainline scheduler for Linux. If you have that then you need an expert on Real Time on Linux. You should probably use the [embedded-linux] tag to get the attention of that crowd. I laugh at boot-times in excess of 50ms ;-) Commented Dec 14, 2024 at 9:08

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.