附录01 相关通信协议介绍

约 1573 字大约 5 分钟

2026-03-20

1 OTA 版本检查和激活

程序初始化结束后首先要向服务器发送post请求

Header
Device-Id	设备WiFi Mac地址
Client-Id	UUID，第一次随机生成，之后需要报存
Accept-Language	语言标志，我们直接选zh-CN
User-Agent	开发板名称/固件版本

{
  "application":
  {
    "compile_time": "Apr 28 2025",
    "elf_sha256": "c8ded1fda8e3716d3f5da1ebcf863a5d3bedcd8f273e6be9e9990fca377eb32c",
    "idf_version": "v5.3.2-dirty",
    "name": "template-app",
    "version": "7354bef-dirty"
  },
  "board":
  {
    "channel": 1,
    "ip": "192.168.1.100",
    "mac": "f0:f5:bd:50:a9:44",
    "name": "bread-compact-wifi",
    "rssi": -50,
    "ssid": "atguigu",
    "type": "bread-compact-wifi"
  },
  "chip_info":
  {
    "cores": 2,
    "features": 18,
    "model": 9,
    "revision": 2
  },
  "chip_model_name": "esp32s3",
  "flash_size": 8388608,
  "mac_address": "f0:f5:bd:50:a9:44",
  "minimum_free_heap_size": 8494904,
  "ota":
  {
    "label": "ota_0"
  },
  "partition_table":
  [
    {
      "address": 36864,
      "label": "nvs",
      "size": 16384,
      "subtype": 2,
      "type": 1
    },
    {
      "address": 53248,
      "label": "otadata",
      "size": 8192,
      "subtype": 0,
      "type": 1
    },
    {
      "address": 61440,
      "label": "phy_init",
      "size": 4096,
      "subtype": 1,
      "type": 1
    },
    {
      "address": 65536,
      "label": "model",
      "size": 983040,
      "subtype": 130,
      "type": 1
    },
    {
      "address": 1048576,
      "label": "ota_0",
      "size": 3670016,
      "subtype": 16,
      "type": 0
    },
    {
      "address": 4718592,
      "label": "ota_1",
      "size": 3670016,
      "subtype": 17,
      "type": 0
    }
  ],
  "psram_size": 8388608,
  "uuid": "3d3b749b-6a6a-4b33-ac88-3a5afe1c5a96",
  "version": 2
}

Body 如上所示，需要根据开发板情况填写。服务器在收到我们的请求后，会回复：

{
  "activation":
  {
    "challenge": "97582491-9a14-42a9-a18d-dca62eb67e63",
    "code": "284596",
    "message": "xiaozhi.me\n284596"
  },
  "firmware":
  {
    "url": "",
    "version": "7354bef-dirty"
  },
  "mqtt":
  {
    "client_id": "",
    "endpoint": "",
    "password": "",
    "publish_topic": "",
    "subscribe_topic": "",
    "username": ""
  },
  "server_time":
  {
    "timestamp": 1745824197471,
    "timezone_offset": 480
  },
  "websocket":
  {
    "token": "test-token",
    "url": "wss://api.tenclass.net/xiaozhi/v1/"
  }
}

其中包含几个重要部分：

（1）如果设备未激活，会发送激活信息；我们需要将激活信息显示在设备中

（2）如果设备版本过旧，会发送OTA信息（这部分我们屏蔽了，因为小智官方的固件不适配我们的门铃开发板）

（3）mqtt协议需要的服务器相关信息。

（4）服务器时间戳以及时区

（5）websocket协议相关信息

这部分代码在OTA模块中实现，是独立于具体协议的，不论使用什么协议都需要在启动时发送上面的数据包。

2 Websocket 通信协议

小智机器人对话时，需要和服务器建立流式通信信道，用来收发opus音频数据；同时还需要收发json格式的协议数据。下面是该协议的详细内容。

2.1 Websocket Header

Header
Device-Id	设备WiFi Mac地址
Client-Id	UUID
Authorization	Bearer WEBSOCKET_TOKEN
Protocol-Version	1

2.2 建立信道

当设备被唤醒时（例如识别到唤醒语音），首先要向服务器发送建立信道的握手信息：

{
  "audio_params":
  {
    "channels": 1,
    "format": "opus",
    "frame_duration": 60,
    "sample_rate": 16000
  },
  "transport": "websocket",
  "type": "hello",
  "version": 1
}

其中需要包含一些服务器需要的信息：

（1）发送opus声音的声道数量

（2）编码格式（截止此刻官方服务器只支持opus编码）

（3）opus帧长度

（4）采样率

服务器回复：

{
  "audio_params":
  {
    "channels": 1,
    "format": "opus",
    "frame_duration": 60,
    "sample_rate": 24000
  },
  "session_id": "e50e69eb",
  "transport": "websocket",
  "type": "hello",
  "version": 1
}

其中包含服务器发回的opus片段采样率，声道等信息。这部分实现详见下面的protocol代码。

2.3 IOT 模块控制信息

建立信道后，设备需要向服务器发送本设备的IOT外设，以供AI调用。

{
  "descriptors":
  [
    {
      "description": "Speaker",
      "methods":
      {
        "SetMute":
        {
          "description": "Set mute status",
          "parameters":
          {
            "mute":
            {
              "description": "Mute status",
              "type": "boolean"
            }
          }
        },
        "SetVolume":
        {
          "description": "Set volume level",
          "parameters":
          {
            "volume":
            {
              "description": "Volume level[0-100]",
              "type": "number"
            }
          }
        }
      },
      "name": "Speaker",
      "properties":
      {
        "mute":
        {
          "description": "Mute status",
          "type": "boolean"
        },
        "volume":
        {
          "description": "Volume level[0-100]",
          "type": "number"
        }
      }
    }
  ],
  "session_id": "e50e69eb",
  "type": "iot",
  "update": true
}
还需要向服务器发送当前设备状态：
{
  "session_id": "e50e69eb",
  "states":
  [
    {
      "name": "Speaker",
      "state":
      {
        "mute": false,
        "volume": 60
      }
    }
  ],
  "type": "iot",
  "update": true
}

这两条信息服务器不会有任何回复。这部分代码使用C语言包装非常麻烦，详见IOT控制模块。同时我们还需要再IOT控制模块实现回调，当服务器发送类似下面的调用函数JSON，我们需要调整设备相应的参数：

{
  "commands":
  [
    {
      "method": "SetVolume",
      "name": "Speaker",
      "parameters":
      {
        "volume": 100
      }
    }
  ],
  "session_id": "39e871fb",
  "type": "iot"
}

=================================================================

{
  "commands":
  [
    {
      "method": "SetMute",
      "name": "Speaker",
      "parameters":
      {
        "mute": true
      }
    }
  ],
  "session_id": "39e871fb",
  "type": "iot"
}

2.4 对话

1）发送唤醒词

当设备唤醒后，我们首先需要向设备发送唤醒词：

{
  "session_id": "e50e69eb",
  "state": "detect",
  "text": "你好小智",
  "type": "listen"
}

此时服务器会回复我们语音和文字，具体内容在之后解析

2）发送监听请求

接下来我们需要向服务器发送监听请求，少了这一条服务器不会解析我们发过去的opus语音片段

{
  "mode": "auto",
  "session_id": "e50e69eb",
  "state": "start",
  "type": "listen"
}

3）发送 opus 片段

发送完监听请求后，就可以发送opus片段了。服务器端会使用stt识别声音片段，并给我们回复。发送声音片段时，websocket包要标记位binary。

4）结束请求

当我们想主动结束该轮对话，可以向服务器发送主动停止请求。

{
  "session_id": "e373d277",
  "state": "stop",
  "type": "listen"
}

2.5 服务器回复解析

当我们发送唤醒词或者opus片段后，服务器会给我们回复。回复的种类包括：

1）语音识别结果

我们发送到服务器的opus片段，服务器会做语音识别，并以json的形式发回来，我们可以在设备上显示：

{
  "session_id": "e373d277",
  "text": "今天天气怎么样",
  "type": "stt"
}

2）大模型意图识别表情回复

大模型在回复我们时，会回复我们表情，我们可以根据这个表情在设备上显示相应动画，让机器人更生动

{
  "emotion": "happy",
  "session_id": "e373d277",
  "text": "😊",
  "type": "llm"
}

3）文字回复

文字回复共有三种状态，start，stop，sentence_start。

没次对话开始时先发送start。每句话会发送一次sentence_start。对话结束时发送stop。

服务器会将大模型的回复以文字的形式发给我们，我们可以在设备上显示：

{
  "session_id": "e373d277",
  "state": "start",
  "type": "tts"
}

{
  "session_id": "e373d277",
  "state": "sentence_start",
  "text": "今天北京天气挺好的，晴朗，",
  "type": "tts"
}

{
  "session_id": "e373d277",
  "state": "sentence_start",
  "text": "气温21度，西北风5级",
  "type": "tts"
}

{
  "session_id": "e373d277",
  "state": "stop",
  "type": "tts"
}

4）opus 音频片段

大模型会将回复的音频片段以binary包的形式发回来。我们需要将所有binary包解码并播放。

2.6 中断对话

当AI机器人正在讲话，我们可以发送中断对话的请求，此时机器人会停止说话，而我们可以继续发送监听请求进行下一轮对话

{
  "reason": "wake_word_detected",
  "session_id": "e373d277",
  "type": "abort"
}