I am writing a library for CubeMars motors that use the CAN communication protocol. The library uses 2 threads: the main thread does almost everything, while the second thread is a listener that listens to the CAN bus (= to the motors' answers and confirmations) and saves those information. The main thread can later access that info.
Typically, the main thread writes a command to the CAN bus. Motors send a confirmation packet that is seen and saved by the listener thread. The main thread waits until the confirmation is received before continuing.
This all usually takes around 500 us. However, in about 1% of cases (that happen at random times), this time jumps up to about 10 ms, messing up the control loop. The problem is not coming from the motors themselves.
Used setup:
- CubeMars motor: AK60-6 v.1.1 in MIT mode
- Raspberry 4B with Arch Linux ARM
- MCP2515 CAN board
- Communication via CAN, 1M baudrate
- Code written in C++
Here is the code creating the socket:
int MotorHandler::openSocket(const char* can_bus)
{
// Create a CAN socket
int s = socket(PF_CAN, SOCK_RAW, CAN_RAW);
if (s < 0) {
cout << "Error creating the socket! Exiting" << endl;
exit(1);
}
else
cout << "Socket created successfully" << endl;
// Set socket timeout
struct timeval tv;
tv.tv_sec = 0;
tv.tv_usec = SOCKET_TIMEOUT_US;
int success = setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, (const char*)&tv, sizeof(tv));
if (success < 0) {
cout << "Error setting the timeout to the CAN socket" << endl;
exit(1);
}
// Bind the socket to the CAN bus
struct sockaddr_can addr;
struct ifreq ifr;
strcpy(ifr.ifr_name, can_bus);
ioctl(s, SIOCGIFINDEX, &ifr);
memset(&addr, 0, sizeof(addr));
addr.can_family = AF_CAN;
addr.can_ifindex = ifr.ifr_ifindex;
int result = bind(s, (struct sockaddr *)&addr, sizeof(addr));
if (result < 0) {
cout << "Binding error! Exiting" << endl;
exit(1);
}
else
cout << "Binding successful" << endl;
return s;
}
This is the function executed by the listener thread:
int Listener::listenerLoop(int s)
{
bool stopThread = 0;
cout << "Starting CAN bus monitoring" << endl;
while(!stopThread) {
struct can_frame frame;
int nbytes = read(s, &frame, sizeof(can_frame));
if (nbytes > 0)
parseFrame(frame);
// Thread sleep for scheduling
std::this_thread::sleep_for(chrono::microseconds(10));
{
scoped_lock lock(m_mutex);
stopThread = m_stopThread;
}
}
return 0;
}
What I tried
I was at first using a non-blocking socket.
To make debugging easier, I created a test where I controlled only 1 motor in a single thread, using a non-blocking socket: I sent a command and then listened to the CAN bus waiting for the motor's confirmation. The behavior was the same problematic one, with random delays occurring. When I swapped the socket to blocking mode, everything worked perfectly, with no delays happening.
In the library with the 2 threads, I also set the socket to blocking mode, but it did not change anything: random delays keep happening.
This leads me to think the problem might be in buffering or scheduling, since the issue only arises when using 2 threads. I tried various timers for thread sleeping, but to no avail (it is currently set to 10 us).
I frankly do not understand why that same problem was happening when using a non-blocking socket in a single thread, but disappeared when it got swapped to blocking mode. The reason might be the key to fixing the issue when using 2 threads.
Any help with this would be greatly appreciated.
ifrstruct be initialized? You only init one member, the rest of them are indeterminate. Same thing withtv. That's all quite fishy code.