The problem with home automation software

The home automation space has centralized really hard around MQTT (a lightweight, pub-sub framework), and I think it's an excellent choice - it's not the problem with home automation software.

The state of (open source) home automation software

But before we can talk about the problem, let's get on the same page about the basic components of these distributed systems (loosely described in terms of the 7 layer OSI model):

An MQTT broker - the stateful server that coordinates distributing messages to subscribers and receiving published messages.
IoT devices - a variety of devices with different compute/memory capacity that send sensor data somewhere or receive controls for outputs (motors/lights/speakers) from somewhere. These devices send information in a wide variety of formats across multiple protocols.
IoT hubs - physical hardware with radios and network stacks capable of communicating with devices using less standard physical or data link layers; Zigbee uses the same physical layer (2.4Ghz radio) as Wifi and Bluetooth, but implements a different data link layer and usually requires a dedicated hub to communicate with. Zwave, on the other hand, uses 900mhz radio. These often 'bridge' these protocols over to internet protocol on your LAN or to an MQTT broker.
A frontend - commonly, home assistant. The frontend provides scripting, automation, and historical sensor data - MQTT brokers typically do not keep any record of historical data over time, so frontends often have their own database of past sensor data.
Vended solutions - most consumer IoT devices come with some sort of package of broker and frontend (often a mobile app) administered by the company that made the device. These are usually a mess, and even when they aren't, they're a strong tool for vendor lock-in. I'll be avoiding these, but it's the most common way for people to get started in the home automation space.

It's pretty easy to end up with a bundle of devices with fairly specific application layer requirements that need to communicate - and historically, this is done in your system's frontend. For an example, take a look at the home assistant integration list for local push and local polling (more information here about home assistant's integration classification). These plugins offer some specific value to the frontend - they abstract the application layer into home assistant's concept of entities - but because of this abstraction, they're incredibly tailored to the circumstances.

The problem that kicked this off

I wanted to do Bluetooth presence tracking in home assistant - essentially, you use the RSSI of a Bluetooth Low Energy (BLE) beacon as measured from multiple radios to determine where the beacon is. This is a well understood problem in general, and one with quite a few solutions tailored to home assistant. Here's a quick overview of the most promising projects I found:

monitor.sh, a small script for doing presence detection and reporting it back via MQTT. You then have to build the home assistant behaviors out of the MQTT primitives. Unfortunately, it uses a clever strategy for binary presence detection that stops working if one BLE device can be seen simultaneously by multiple radios. In my case, that's almost always the case.
ESPresence, an incredibly robust-looking home assistant plugin for doing 3D triangulation / trilateration of beacons using RSSI. Solves several other problems including filtering to reduce jitter, implementing MQTT device discovery, and correctly tracking BLE devices that use resolvable private addresses instead of static MAC addresses. However... ESPresence assumes that all the IoT devices acting as radios are ESP32-based devices flashed with ESPresence's ESP32 firmware. Since my device fleet are Raspi 0W's, I don't have a way to make use of the firmware, and because ESPresence assumes you'll only use their firmware to communicate with the system, the MQTT implementation is entirely undocumented despite the fact that you should be able to use the home assistant plugin by emitting data with the right structure to the MQTT topics the project uses. As an additional caveat, the ESP32 devices flashed with this firmware become single-purpose devices - since ESPresence's firmware is not a general-purpose sensor platform, using the same ESP32 for presence detection and controlling other sensors is significantly harder.
Bermuda, a home assistant plugin that uses ESPHome (a general-purpose sensor firmware for ESP8266/ESP32/Raspi2040 devices). Unfortunately it works by using ESPHome's Bluetooth proxy feature, which only works on ESP32s, and I once again cannot use the software with my fleet of raspi 0Ws. As an upside, ESP32s running ESPHome are not limited to being single-purpose devices, like ones running ESPresence's firmware.
room-assistant, a distributed application that you run on several raspberry pis (or other linux computers). It elects a leader, coordinates and exchanges information about Bluetooth device RSSI, and comes to consensus about the location of the devices. Additionally, it solves several other interesting problems - it can debounce, do rolling averages, and supports a plethora of sensors - Bluetooth, thermopile (directional heat sensors), millimeter wave presence sensing, and standard motion sensors (optical/IR). It offers an integration (uh oh) to bridge it to home assistant. Unfortunately, it's been unmaintained for 3 years, and the application's install script no longer works on modern versions of Raspbian and NodeJS.

What's the theme?

Highly coupled architecture from start to finish. Let's get more specific:

monitor.sh: linux computer sensors 🠶 MQTT (🠶 roll your own frontend behaviors)
ESPresence: custom, _single purpose_ ESP32 sensors 🠶 MQTT 🠶 custom, home assistant exclusive plugin
Bermua: custom, multi-purpose ESP32 sensors 🠶 custom, home assistant exclusive plugin
room-assistant: linux computer sensors 🠶 custom consensus protocol over IP stack 🠶 custom integration to MQTT or home assistant

Two of these must go through MQTT, and a third expects you to route information to MQTT after. If we're going to assume users are using an MQTT broker (which is significantly easier to add/change than changing the fleet of sensor devices you have), why not use that to decouple sensors and behavior? I propose something like this:

any sensor that can send an MQTT payload 🠶 MQTT 🠶 behavior provider 🠶 MQTT 🠶 frontend

With this strategy, 'behavior' providers / libraries can specify their MQTT contract as input and output, and leave users to assemble any sensors that provide information in the correct format. In fact, if we're willing to have a few more MQTT topics (channels), we can fix that ourselves with something like:

any sensor 🠶 MQTT in sensor output format 🠶 (some general purpose event listener) 🠶 MQTT in behavior input format 🠶 behavior provider 🠶 MQTT in behavior output format 🠶 (some general purpose event listener) 🠶 MQTT in frontend input format 🠶 frontend

An (unsurprising?) conclusion

I think this ecosystem would be better off with clearer contracts and more loosely coupled parts, and I think the way to do this is to make MQTT the communication bus of record and write that 'general purpose event listener' software in a configuration-first way that encourages documenting input/output contracts and enables people to provide their own sensors, behaviors, and frontends without needing to know as much about these systems' internals.

I've started working on a library that'll fill this niche for me (and hopefully be useful to others!) named m-cutie-t.