Tales along the way in my quest to Integrate Everything.

I was working on Decider the other night when I realized I was coming up to my 1000th commit in SVN. Curious, I pulled a log of Decider revisions to see when it was that I committed the very first revision of Venturii Decider. Would you believe that my first commit to the Decider repo was on Wednesday, March 16, 2011! Here is the notes from that initial entry:

r1 | cube | 2011-03-16 21:52:38 -0600 (Wed, 16 Mar 2011) | 5 lines

Initial import.

Scripts automatically install to /usr/local/etc/Venturii/ for config, and usr/local/bin for decider. :) :) :)

Work in progress.

One might consider that date to be Decider’s birthday, though it’s origins date back a few years before I started using a versioning system. I remember tarball and gzipping my entire source tree whenever I wanted to “save” a particular version to fall back on in case subsequent changes became unstable or I needed to refer to deleted code later on. It is hard to believe I was able to accomplish as much with Venturii as I did without any version control, or at that time even an IDE! For years, I wrote Venturii using nothing more than Vim and gcc! My how things have changed since then…

At that time, I had created a number of Venturii modules for integrating with X-10 switches (remember those?!?), controlling Bosch Autodome PTZ cameras, integrating my DSC MAXSYS PC-4020 burglar alarm system through a PC-4401 RS-232 module, a module for talking to various Access Control I/O boards, one for reading analog sensor data from an ADC I/O module I bought from www.controlanything.com. Actually, I had initially called the project OpenHouse, but this sounded too much like it may be related to real estate so the name later changed to Venturii. I think I wrote up the story of the name somewhere else so I won’t recall it here.

The Bosch Autodome cameras I had set up around the house integrated with Venturii using the “twophase” module as I named it, which produced the necessary PTZ commands using the Autodome protocol, but I had a Philips LTC8780/60 Data Converter Unit translate the electrical signal from RS-232 to actual Biphase to go to each camera. The XR series Analog to Digital converter board was part of the original inception of Venturii, in that it was my first crack at reading analog sensor data from a LM34 temperature sensor in order to open and close the Honeywell ARD-5 damper I’d installed on the heat vent feeding my bedroom.

I never imagined back then that this humble creation would eventually be running entire buildings!

I’ve hit a most frustrating problem. I am creating a new Venturii Buscom plugin using libmodbus, but before I got very far I became stumped with a strange problem. Here is the code:

#include <stdio.h>
#include <errno.h>
#include <modbus.h>

int main(void) {
  modbus_t *mb;
  uint16_t tab_reg[32];

  mb = modbus_new_tcp("192.168.1.100", 502);

  modbus_set_debug(mb, 1); 

  modbus_set_error_recovery(mb,
    MODBUS_ERROR_RECOVERY_LINK |
    MODBUS_ERROR_RECOVERY_PROTOCOL);

  modbus_connect(mb);

  /* Read 5 registers from the address 0 */
  int n = modbus_read_registers(mb, 0, 16, tab_reg);

	printf("Result: [%d]\n", n); 

	if (n == -1) {
		printf("Error: [%s]\n", modbus_strerror(errno));
	}

  int i;
  printf("Register Data: "); 
  for (i=0; i<n; i++) {
	  printf("[%02x] ", tab_reg[i]);
  }

  printf("\n");


  modbus_close(mb);
  modbus_free(mb);
}

I downloaded the latest source for libmodbus and installed it into /usr/local. Therefore, I am compiling with:


gcc mytest.c -I /usr/local/include/modbus/ -L /usr/local/lib/ -lmodbus

The output:

Connecting to 192.168.1.100:502
[00][01][00][00][00][06][FF][03][00][00][00][10]
Waiting for a confirmation...
<00><01><00><00><00><23><FF><03><10><00><00><00><00><00><00><00><0F><00><00><00><00><00><00><00><0A>
Message length not corresponding to the computed length (25 != 41)
Bytes flushed (16)
Result: [-1]
Error: [Invalid data]
Register Data: 

It does not make any difference if I choose a smaller number of bytes to read, the result is always a complaint from libmodbus that not enough bytes were received from the slave. What is strange to me though, is that the remaining bytes are there! In Wireshark, I see the entire query was sent from my test program to the modbus simulator in a single packet:

Wireshark capture of my MODBUS Request

And the response from the simulator is delivered in a single packet – containing the correct data. (The bytes line up with the actual bytes in the simulator’s register).

Wireshark capture of the MODBUS Response

Perplexing me further is the fact that, due to the error handling parameters I’ve set on my modbus context, you can see that after claiming there is not enough data, libmodbus then flushes the remainder of the buffer – which mysteriously contains the exact number of missing bytes.

In order to understand why this is not working, the next step, I thought, would be to analyze the method by which libmodbus computes the actual number of bytes it receives. We are going to make several assumptions off the bat: First, I am assuming that it has already correctly identified the number of bytes that it is expecting. Second, I’m going to assume that the operating system is not doing something funny “under the hood” with the incoming data. Since we can see the entire response in a single packet, I’m assuming that the entire packet was available at the sockfd for reading.

Looking at the definition for the MODBUS TCP Packet format (from Wikipedia)

Again, here is our received data:

<00><01><00><00><00><23><FF><03><10><00><00><00><00><00><00><00><0F><00><00><00><00><00><00><00><0A>

The first two bytes, 00 01, are our TID or Transaction Identifier. Check.

The next two bytes are both 00, the correct values for Modbus/TCP. Check.

Length field: 00 23 = Number of bytes remaining in this frame. So far we’ve read 6 bytes into the frame, and 0x23 in Hex is 35 in Base-10 or Decimal. 35 + 6 = 41. So far, so good. 0xFF is our slave address, Next we have our function code, 0x03 (Read Registers), and thus begins our data with 0x10, which is 16 in decimal and in this context indicates that there are 16 words coming.

All of that appears correct to me. So why is libmodbus 3.0.8 and 3.1.6 both complaining that not enough bytes are in the message, and the promptly flushing all the missing bytes? Why can I find nothing online about this error message? Is no one else seeing results like this? Libmodbus is part of every major Linux distribution, so either no one is using it anymore, or it probably works just fine, leaving the problem back in my court. Could this be some OS level thing? Buffering, perhaps, of some kind?

Pondering that last query, I decided to try my test code on a new VM. Actually, I pulled a new Fedora 31 container, updated everything on it and ran the same test code above. Would you like to take a guess at the results?

[root@0f574053a251 /]# gcc mytest.c -I /usr/include/modbus/ -lmodbus
[root@0f574053a251 /]# ./a.out 
Connecting to 192.168.1.100
[00][01][00][00][00][06][FF][03][00][00][00][10]
Waiting for a confirmation...
<00><01><00><00><00><23><FF><03><10><00><00><00><00><00><00><00><0F><00><00><00><00><00><00><00><0A>
Message length not corresponding to the computed length (25 != 41)
16 bytes flushed
Result: [-1]
Error: [Invalid data]
Register Data: 
[root@0f574053a251 /]#

It’s worthy to note that even Fedora 31 still carries the 3.0.8 version of libmodbus, probably because it was the most stable release. None the less – the exact same result! Now what? I thought I might run some of the built-in tests that come with the source version.

 TEST INVALID INITIALIZATION:
The device string is empty
OK
The baud rate value must not be zero
OK
The service string is empty
OK

ALL TESTS PASS WITH SUCCESS.

Ok, so the library thinks it is sane… I wonder what it’s data looked like when it tried to utilize the 0x03 function code. (There was a lot of data spit out in both the client and server test programs… Scrolling back I found this:

[00][00][00][00][00][06][FF][03][01][60][00][00]
* try function 0x3: read 0 values: Waiting for a confirmation...
<00><00><00><00><00><03><FF><83><03>
OK
[00][00][00][00][00][06][FF][03][01][60][00][7E]
* try function 0x3: read 126 values: Waiting for a confirmation...
<00><00><00><00><00><03><FF><83><03>
OK

It doesn’t matter how many bytes I request, the result always appears to be short. In this next example, I attempt to read only one register:

[00][01][00][00][00][06][00][03][00][00][00][01]
Waiting for a confirmation...
<00><01><00><00><00><05><00><03><00>
Message length not corresponding to the computed length (9 != 11)
Bytes flushed (2)

If I call modbus_get_bits() instead, and request a single bit:

[00][01][00][00][00][06][00][01][00][00][00][01]
Waiting for a confirmation...
<00><01><00><00><00><04><00><01><00>
Message length not corresponding to the computed length (9 != 10)
Bytes flushed (1)

I am still one byte short, and then it merrily flushes one byte.

I got my first break when I tried to request 4 bits using modbus_get_bits() (function code 0x01) :

[00][01][00][00][00][06][FF][01][00][01][00][04]
Waiting for a confirmation...
<00><01><00><00><00><04><FF><01><01><02>
Result: [4]
Register Data: [00] [01] [00] [00]

Wait, what?! That worked??? I tried other quantities:

[00][01][00][00][00][06][FF][01][00][01][00][08]
Waiting for a confirmation...
<00><01><00><00><00><04><FF><01><01><02>
Result: [8]
Register Data: [00] [01] [00] [00] [00] [00] [00] [00]
(Requesting 128 bits starting at address 1)
[00][01][00][00][00][06][FF][01][00][01][00][80]
Waiting for a confirmation...
<00><01><00><00><00><13><FF><01><08><04><04><08><24><00><00><00><00>
Message length not corresponding to the computed length (17 != 25)
Bytes flushed (8)
(Requesting 9 bits starting at address 1)
[00][01][00][00][00][06][FF][01][00][01][00][09]
Waiting for a confirmation...
<00><01><00><00><00><05><FF><01><01><02>
Message length not corresponding to the computed length (10 != 11)
Bytes flushed (1)
(Requested 8 bits again from address 1 to make sure I wasn't going crazy)
[00][01][00][00][00][06][FF][01][00][01][00][08]
Waiting for a confirmation...
<00><01><00><00><00><04><FF><01><01><02>
Result: [8]
Register Data: [00] [01] [00] [00] [00] [00] [00] [00]

Message length not corresponding to the computed length (51 != 93)
Bytes flushed (42)

Now I am even more confused than ever. Some quantities of blocks requested return data properly, others fail – In further testing I started to notice a pattern – any time I requested an even number of blocks from a register, for example, it would receive the correct number of bytes MINUS the number of bytes I’d requested… every time. If I requested 100 registers, I’d get “Message length not corresponding to the computed length (109 != 209) … Bytes flushed (100)”. If I’d request 42 registers, it would reply, “Message length not corresponding to the computed length (51 != 93) … Bytes flushed (42)”. Any odd number of requested registers returned that number plus 1 bytes missing, such as this request for 43 blocks: “Message length not corresponding to the computed length (51 != 95) … Bytes flushed (44)”.

I started to wonder if there was some miscalculation in the library, and with nothing else I could think to try, it was at this point that I decided to dive into the source code for libmodbus. My first step was to add some debugging output at various points along the way as it grabs the response byte by byte. Here you can see the entire output of my test run, requesting 16 registers starting at address 1:

[00][01][00][00][00][06][FF][03][00][01][00][10]
Waiting for a confirmation...
_modbus_receive_msg(DEBUG): Length to read: [8]
_modbus_receive_msg(DEBUG): recv(msg_length: [0] length_to_read: [8]) retval: [8]
<00><01><00><00><00><23><FF><03>_modbus_receive_msg(DEBUG): Length to read: [1]
_modbus_receive_msg(DEBUG): recv(msg_length: [8] length_to_read: [1]) retval: [1]
<10>_modbus_receive_msg(DEBUG): Length to read: [16]
_modbus_receive_msg(DEBUG): recv(msg_length: [9] length_to_read: [16]) retval: [16]
<00><00><00><00><00><0F><00><00><00><00><00><00><00><0A><00><00>
check_confirmation(DEBUG): rsp_length_computed: [41]
Message length not corresponding to the computed length (25 != 41)
Bytes flushed (16)

It reads the first 8 bytes to figure out what this message is about. It then asks for one byte to determine the number of registers that are following, and we get <10>, which of course is 16 in decimal. But look at the next line: It asks for 16 more bytes, not 32 as it should. Keep in mind that these holding registers are 16 bits wide each. Therefore, either the library has interpreted 16 to mean 16 8 bit blocks (which I’d suspect would be incorrect) or the modbus simulator I am using is incorrectly reporting words when it should be reporting bytes and “confusing” the library into thinking only 16 more bytes are coming when it has already computed that there should be 32 here. Instead of flipping a coin, I decided to turn to the MODBUS specification to find out what this value *should* be, and if there are circumstances that would provide that either could be correct given special circumstances.

There you have it, libmodbus is correct – this field represents the number of 8-bit bytes that should follow in the response. Therefore, it appears that my simulator is to blame here. Looking now towards MOD_RSSIM, the MODBUS simulator I was using, I had version 6.7. It turns out there are a lot of newer versions since then, and the tool is still actively developed. I will download and try the new version tomorrow and see what my results are then.

It is now the morning, and with MOD_RSSIM version 8.20 in the stead of 6.7, I re-ran all of my tests again and guess what? Everything works! What a rabbit hole to have fallen into, but such is the nature of software development, problem solving, and discovery! Now to finish the Venturii Buscom Modbus plugin and move onto bigger and brighter things!

I’ve been trying to create a proof-of-concept program that makes use of the open source FINS library I came across here: https://www.libfins.org/

The first obstacle I encountered was that there does not appear to be any examples on the Internet that use this library. In my experience, this means one of two things:

  1. What I am trying to do is so simple that no one else has bothered writing it up, or
  2. I am going about it entirely the wrong way and am so far off the right course that no one else has gotten to this place.

I looked into the source code itself, for I had noticed some discrepancies between the published documentation and actual function calls. Since the source code is almost always correct, it seemed like a good place to start.

I got a skeleton program written and it seemed able to connect to the PLC, in this case a CP1L-EM30DR-D on loan from Omron. However, any subsequent command I tried to execute, such as finslib_cpu_unit_status_read() or finslib_cpu_unit_data_read() or finslib_memory_area_read_uint16() all failed with the same error message indicating that too many errors had occurred and that the connection had been terminated.

tcpdump showed that the test program was in fact connecting to the PLC, but little more could be gleaned from the hex dumps it spat out. I began to walk through the library code, inserting some console chatter to try to find out exactly where each call was failing. I corrected a few mistakes I’d made. One thing that became apparent was that some of the initialization only takes place if you pass it a null pointer for the fins_sys_tp* parameter. I was creating a fins_sys_tp struct, initializing it to zeroes and passing the address of this structure to finslib_tcp_connect(), which of course then skipped all the initialization. Because of this, the connection type was set to “unknown” by default, so the communication functions all would error out immediately. Passing finslib_tcp_connect() a null pointer instead yielded the correct result.

The next thing I needed to correct was that my model of PLC was not known to the library, so it left another variable, plc_mode, set to FINS_MODE_UNKNOWN. This too, caused communication functions to fail to send data, since they did not know what format to use. As a test, I modified the library to try using CV mode on my model of CPU, which it seemed to like as all the functions invoked after that returned much more meaningful return values, and tcpdump got a lot more excited.

After that, the rest of my proof-of-concept tests proceeded much more to my liking. I was able to read and write directly to the CIO memory on the PLC, which in turn activated and de-activated outputs and LEDs on the unit. Next up I will try to read some input values (the PLC is about 30 km away from me at the moment) but I expect the remainder of the tests will go much more smoothly. Thus, we now have a much clearer path to developing the FINS Buscom plugin!

Here is my test code, in case it may help anyone else save several days of head scratching and Googling:

/*
 * Program: libfins Test Program
 * File:    test.c
 * Author:  John Finlay
 *
 * This file is licensed under the MIT License as stated below
 *
 * Copyright (c) 2019 John Finlay
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */

#include "include/fins.h"
#include 
#include 


int main() {

	/* Initialize */

	int err = 0, err_max = 10;
	struct fins_sys_tp *c = NULL;

	// memset(&c, 0, sizeof(struct fins_sys_tp)); // Not needed - finslib_tcp_connect() only sets up default variables if it is passed a NULL pointer.
	// init_system(&c, err_max);

	/* Connect to the PLC */

	struct fins_sys_tp *sys = finslib_tcp_connect(c, "192.168.250.10", 9600, 1,
			5, 0, 1, 10, 0, &err, err_max);
	printf("Connection Call:   Error Code was [%u]\n", err);

	char err_msg[64];
	finslib_errmsg(err, err_msg, 64);
	printf("Connection Call:   Error Message Was: [%s]\n", err_msg);

	/* Find out what kind of PLC we are talking to */

	struct fins_cpudata_tp cpudata;

	int cuer = finslib_cpu_unit_data_read(sys, &cpudata);

	finslib_errmsg(cuer, err_msg, 64);
	printf(
			"Read CPU Data:     Error Message Was: [%d] [%s] - sys->error_count = [%u] sockfd: [%u]\n",
			cuer, err_msg, sys->error_count, sys->sockfd);

	/* Read CPU Status */

	struct fins_cpustatus_tp cpustat;
	int cpustat_ret = finslib_cpu_unit_status_read(sys, &cpustat);
	finslib_errmsg(cpustat_ret, err_msg, 64);
	printf("CPU Unit Stat Read Error Message Was: [%s]\n", err_msg);

	/* Read Memory Area */

	uint16_t arr[2048];
	int i;
	for (i = 0; i < 2048; i++) {
		arr[i] = 0;
	} // Could use memset...

	int num = 16;
	int read_ret = finslib_memory_area_read_uint16(sys, "CIO100.0", arr, num);

	finslib_errmsg(read_ret, err_msg, 64);
	printf("Memory Area Read Error Message Was: [%s]\n", err_msg);

	for (i = 0; i < num; i++) {
		printf("arr[%u] = [%u]\n", i, arr[i]);
	}

	/* Write Memory Area */

	for (i = 0; i < 2048; i++) {
		arr[i] = 0;
	} // Again, could use memset...

	num = 16;
	int write_ret = finslib_memory_area_write_uint16(sys, "CIO100.0", arr, num);
	finslib_errmsg(write_ret, err_msg, 64);
	printf("Memory Area Write Error Message Was: [%s]\n", err_msg);

	for (i = 0; i < num; i++) {
		printf("arr[%u] = [%u]\n", i, arr[i]);
	}

	/* Read Error Log */

	struct fins_errordata_tp errordat;
	size_t num_to_read = 1;
	size_t num_read = 0;
	int err_ret = finslib_error_log_read(sys, &errordat, 0, &num_to_read,
			&num_read);

	finslib_errmsg(err_ret, err_msg, 64);
	printf(
			"Read Error Log:    Error Message Was: [%s] - Requested: [%zu] Records That Were Read: [%zu]\n",
			err_msg, num_to_read, num_read);

	/* Read CPU Data */

	struct fins_cpudata_tp cpuinfo;

	int cpu_ret = finslib_cpu_unit_data_read(sys, &cpuinfo);

	finslib_errmsg(cpu_ret, err_msg, 64);
	printf("CPU Unit Data Read Error Message Was: [%s]\n", err_msg);

	/* Disconnect */

	finslib_disconnect(sys);
	printf("Connection closed.\n");
	return 0;
}

Compile with the following command:

gcc -Wall test.c -o test -Llib -lfins

Already April!

Venturii Buscom has seen tremendous development over the past month, and what may even be an architecture shift for the entire Venturii platform. Whereas I used to create separate modules whose sole existence was to communicate with one type of device, service or system, I found that I was having to re-use a lot of the foundational code in each module. This resulted in many copies of essentially the same thing, but applying bug fixes and feature additions became tedious and repetitive and in some cases, intermittent. Therefore I had an epiphany one morning as I was working on Buscom – why not modularize the communication code?

I wandered down the path of this thought process and came upon a system whereby clearly marked delineation points became apparent between what would be the base module (Buscom) and the differentiated code necessary for communicating with each disparate system. It made sense to me to separate this code into plug-ins, each of which could utilize the same base code Buscom was providing for establishing socket or serial communication with a thing, as well as the backbone communication to Decider. This lead to the rapid development of three plug-ins concurrently, each of which was growing off the same trunk of code.

Naturally one can still deploy multiple instances of Buscom, and indeed one could continue to utilize a single instance of it to house the communication pathway between a single system and the rest of Venturii. Thus there would still be all the benefits of an individual module communicating with an individual system, without the drawbacks of having to maintain multiple copies of the same pieces of code.

This new architecture has me so excited that I am planning to develop (or in some cases, re-develop) more integrations in the near future using it. Indeed it expedites the development process significantly, allowing more time to be spent on the actual integration and less in preparation for it. Look for a new integration announcement in the next month or so!