ServerBee

Monitoring

Real-time system monitoring, dashboards, and historical data.

ServerBee provides real-time monitoring of all your connected servers through a unified web dashboard. Metrics are streamed over WebSocket for instant updates without polling.

Dashboard Overview

The main dashboard shows all registered servers with their current status at a glance:

  • Online/Offline status with color indicators
  • CPU usage percentage with visual bar
  • Memory usage (used / total) with percentage
  • Disk usage (used / total) with percentage
  • Network throughput (upload/download speed)
  • Load average (1/5/15 minute)
  • Uptime duration
  • Region and country flags (when GeoIP is enabled)

Servers are organized by groups and sorted by weight. You can filter, search, and batch-operate on servers from this view.

Real-Time Updates

The browser connects to the server via WebSocket at /ws/browser. The communication flow works as follows:

  1. On initial connection, the server sends a FullSync message containing the current state of all servers
  2. As agents report new metrics, the server broadcasts Update messages to all connected browsers
  3. When an agent connects or disconnects, ServerOnline / ServerOffline events are sent

This means the dashboard updates in real time -- there is no need to refresh the page or wait for polling intervals.

Metric Types

The agent collects the following metrics at a configurable interval (default: every 3 seconds):

System Resources

MetricUnitDescription
CPU Usage%Overall CPU utilization (0-100)
Memory UsedbytesCurrent RAM usage
Swap UsedbytesCurrent swap space usage
Disk UsedbytesTotal disk space consumption

Network

MetricUnitDescription
Network In Speedbytes/secCurrent download speed
Network Out Speedbytes/secCurrent upload speed
Network In TransferbytesCumulative total download since agent start
Network Out TransferbytesCumulative total upload since agent start

System Load

MetricDescription
Load Average 1mSystem load over the last 1 minute
Load Average 5mSystem load over the last 5 minutes
Load Average 15mSystem load over the last 15 minutes

Connections and Processes

MetricDescription
TCP ConnectionsNumber of active TCP connections
UDP ConnectionsNumber of active UDP connections
Process CountTotal number of running processes

Environment

MetricDescription
TemperatureCPU temperature in degrees Celsius (optional)
UptimeSystem uptime in seconds

Disk I/O

The agent collects per-disk read/write throughput on all major platforms:

MetricUnitDescription
Read Speedbytes/secRead throughput per disk
Write Speedbytes/secWrite throughput per disk

Linux: Reads /proc/diskstats directly. Only physical block devices are tracked (e.g., sda, nvme0n1). Virtual devices (loop*, dm-*, ram*, sr*) and partitions are excluded. DiskIo.name is the block device name.

macOS / Windows: Uses sysinfo Disk::usage() API with mount_point() as the key. This provides per-mount-path semantics (e.g., /, /home, C:\) rather than per-physical-disk. Known limitation: on macOS with APFS, multiple volumes sharing one physical disk may report overlapping I/O counters.

On the first sample after agent startup, a baseline is established and an empty list is reported. Subsequent samples compute delta-based rates.

Disk I/O data is stored as a JSON column (disk_io_json) in the records and records_hourly tables. The hourly aggregator computes per-device average read/write rates.

GPU (Optional)

When GPU monitoring is enabled (enable_gpu = true), per-device metrics are collected:

MetricDescription
GPU NameDevice model name
GPU UtilizationGPU core utilization percentage
GPU Memory UsedVRAM usage in bytes
GPU Memory TotalTotal VRAM in bytes
GPU TemperatureDevice temperature in degrees Celsius

Server Information

In addition to periodic metrics, each agent reports static system information when it first connects:

  • CPU name, core count, and architecture
  • Operating system and kernel version
  • Total memory, swap, and disk capacity
  • IPv4 and IPv6 addresses
  • Virtualization type (KVM, Xen, Docker, etc.)
  • Agent version

This information is displayed on the server detail page and stored in the database.

Historical Data and Charts

ServerBee stores metric records at two levels of granularity:

Raw Records

  • Written every 60 seconds by the RecordWriter background task
  • Retained for 7 days by default (configurable via retention.records_days)
  • Each record captures all metric values at a single point in time

Hourly Aggregated Records

  • Computed by the Aggregator background task
  • Averages of all raw records within each hour
  • Retained for 90 days by default (configurable via retention.records_hourly_days)
  • Used for long-term trend visualization

The dashboard charts automatically switch between raw and hourly data depending on the selected time range.

GPU Records

GPU metrics are stored separately in a dedicated table with per-device granularity. Each record includes the device index, name, memory, utilization, and temperature. These are retained for 7 days by default.

Server Groups

Organize your servers into logical groups for easier management:

  • Create groups with custom names and sort weights
  • Assign servers to groups
  • Groups appear as sections in the dashboard
  • Sort weight controls the display order (lower weight = higher position)

Groups can represent environments (production, staging), regions (US-East, EU-West), providers (AWS, Hetzner), or any other organizational structure that makes sense for your setup.

Server Details

Each server has a detail page showing:

  • Real-time streaming charts (default mode)
  • System information (hardware, OS, network)
  • Historical trend charts with time range selection
  • Disk I/O charts with merged and per-disk views (all platforms, historical mode)
  • 90-day uptime timeline with daily availability breakdown
  • Server metadata (group, tags, remarks, pricing)
  • Actions (terminal access, edit, delete)

Real-Time Charts

The server detail page defaults to Real-time mode. In this mode, charts display live data streamed from WebSocket updates:

  • Data source: Accumulated from BrowserMessage::Update events via the ['servers'] TanStack Query cache
  • Update interval: ~3 seconds (matches the agent report interval)
  • Buffer size: 10-minute ring buffer (~200 data points), automatically trimmed
  • Deduplication: Uses the server-side last_active timestamp to filter duplicate events
  • Available charts: CPU, Memory, Disk, Network In/Out, Load Average (1m)
  • Time axis: First tick shows HH:mm:ss, subsequent ticks show mm:ss

Temperature, GPU, and Disk I/O charts are not available in real-time mode because the WebSocket ServerStatus message does not include these fields. Switch to a historical view to see temperature, GPU, and disk I/O data.

Disk I/O Charts

When historical disk I/O data is available, the server detail page displays a Disk I/O chart with two views:

  • Merged -- Combined read/write throughput across all physical disks
  • Per Disk -- Individual charts for each physical disk (e.g., sda, nvme0n1)

Both views show read speed (blue) and write speed (green) as area charts. Missing data points are filled with zero values to maintain a continuous timeline.

Uptime Timeline

The server detail page includes an uptime card with a 90-day timeline. Each day is shown as a colored bar:

  • Green -- 100% uptime
  • Yellow -- Below the yellow threshold (degraded)
  • Red -- Below the red threshold (major outage)
  • Gray -- No data

Uptime data is queried via GET /api/servers/{server_id}/uptime-daily?days=90. The endpoint returns a UptimeDailyEntry per day with date, online_minutes, total_minutes, and uptime_percent fields. Missing dates are gap-filled with zero values.

Network Quality Views

The /network overview and /network/{server_id} detail pages summarize configured probe targets for each server.

  • Newly assigned targets appear immediately, even before the first probe result is written
  • Targets without probe data render a no-data state instead of disappearing from the summary
  • The overview search box follows the active UI language

Time Range Selector

The time range bar offers these options:

ModeData SourceDescription
Real-timeWebSocket ring bufferLive streaming data (default)
1hREST API (raw records)Last 1 hour from database
6hREST API (raw records)Last 6 hours
24hREST API (raw records)Last 24 hours
7dREST API (hourly records)Last 7 days (aggregated)
30dREST API (hourly records)Last 30 days (aggregated)

When switching from Real-time to a historical view, the REST API queries are enabled and chart data loads from the database. When switching back to Real-time, the accumulated buffer data is displayed immediately (data continues to accumulate in the background even while viewing historical data).

Data Flow

Agent                  Server                    Browser
  |                      |                         |
  |-- Report (3s) ------>|                         |
  |                      |-- cache in AgentManager |
  |                      |                         |
  |                      |-- RecordWriter (60s) -->|
  |                      |   writes to SQLite      |
  |                      |                         |
  |                      |-- Update (broadcast) -->|
  |                      |                     real-time UI
  |                      |                         |
  |                      |-- Aggregator (hourly) ->|
  |                      |   hourly averages       |
  |                      |                         |
  |                      |-- Cleanup (hourly) ---->|
  |                      |   delete old records    |

The agent reports every 3 seconds. The server caches the latest report in memory and immediately broadcasts it to connected browsers. Every 60 seconds, all cached reports are batch-written to SQLite. Every hour, raw records are aggregated into hourly summaries, and expired data is cleaned up based on retention settings.

Traffic Statistics

ServerBee tracks network traffic at hourly and daily granularity, enabling billing cycle-aware usage monitoring with prediction capabilities.

How It Works

Traffic tracking is integrated into the existing metric recording pipeline:

  1. Each agent reports cumulative network byte counters (net_in_transfer, net_out_transfer) every 3 seconds
  2. The RecordWriter calculates the delta between consecutive reports and accumulates hourly traffic
  3. The Aggregator rolls up hourly data into daily totals (timezone-aware via scheduler.timezone)
  4. The Cleanup task removes expired traffic records based on retention settings

Traffic API

Query traffic statistics for any server via GET /api/servers/{id}/traffic. The response includes:

  • Cycle totals -- Total bytes in/out for the current billing cycle
  • Usage percentage -- Traffic used vs. the server's traffic limit (if configured)
  • Prediction -- Estimated end-of-cycle usage based on current consumption rate
  • Daily breakdown -- Per-day traffic totals within the billing cycle
  • Hourly breakdown -- Per-hour traffic totals for today

Billing Cycles

Traffic is queried within billing cycles determined by:

  • billing_cycle -- The cycle type: monthly (default), quarterly, or yearly
  • billing_start_day -- The day of month when the billing cycle starts (1-28, default 1)

For example, if billing_start_day = 15 and billing_cycle = monthly, the cycle runs from the 15th of each month to the 14th of the next month.

Frontend Display

The server detail page shows a collapsible traffic card with:

  • Progress bar -- Visual usage against the traffic limit
  • Daily chart -- Bar chart showing daily in/out traffic within the cycle
  • Hourly chart -- Bar chart showing today's hourly traffic breakdown
  • Prediction -- Estimated total usage by end of cycle

Data Retention

Data TypeDefault RetentionConfig Key
Traffic hourly7 daysretention.traffic_hourly_days
Traffic daily400 daysretention.traffic_daily_days

Scheduler Timezone

Daily traffic aggregation respects the scheduler.timezone setting. Set this to your billing timezone (e.g., Asia/Shanghai) to ensure daily totals align with your billing provider's day boundaries.

Docker Container Monitoring

ServerBee supports real-time Docker container monitoring when the agent has access to the Docker daemon. This feature requires the CAP_DOCKER capability to be enabled.

Overview

The Docker monitoring page shows:

  • Overview cards -- Running/Stopped container counts, total CPU usage, total memory usage, Docker version
  • Container list -- Searchable, filterable table with Name, Image, Status, CPU%, Memory, Network I/O
  • Events timeline -- Container lifecycle events (start, stop, die, create, destroy) in reverse chronological order
  • Networks dialog -- List of Docker networks with driver, scope, and container count
  • Volumes dialog -- List of Docker volumes with driver, mountpoint, and creation time

Container Details

Click any container row to view:

  • Container info -- Image, status, ports, creation time, container ID
  • Stats cards -- CPU usage, memory (with progress bar), network I/O, block I/O
  • Log stream -- Real-time log output with:
    • stdout in white text, stderr in red text
    • Follow toggle for auto-scrolling
    • Clear button to reset the log view
    • Connection status indicator (Connected/Disconnected)

How It Works

  1. When browsers navigate to the Docker page, a DockerSubscribe message is sent via WebSocket
  2. The server instructs the connected agent to start Docker monitoring
  3. The agent uses the bollard Docker API client to poll containers and stats
  4. Updates are broadcast to subscribing browsers in real time
  5. When all browsers leave the Docker page, monitoring stops automatically (viewer ref-counting)

Container logs use a dedicated WebSocket endpoint (/api/ws/docker/logs/{server_id}) with subscribe/unsubscribe protocol for per-container log streaming.

Data Retention

Data TypeDefault RetentionConfig Key
Docker events7 daysretention.docker_events_days

Container stats and logs are not persisted — they are streamed in real time only.

Network Quality Monitoring

ServerBee includes a built-in network quality monitoring system that probes network targets from each agent and visualizes latency, packet loss, and anomalies.

Preset Targets

96 preset probe targets are embedded in the server binary (not stored in the database):

  • China Telecom -- 31 provincial nodes (TCP probe via Zstatic CDN)
  • China Unicom -- 31 provincial nodes (TCP probe via Zstatic CDN)
  • China Mobile -- 31 provincial nodes (TCP probe via Zstatic CDN)
  • International -- Cloudflare (1.1.1.1), Google DNS (8.8.8.8), AWS Tokyo (ICMP probe)

China targets use domain names ({province}-{isp}-v4.ip.zstaticcdn.com:80) that auto-resolve to the latest CDN node IPs. No IP maintenance is needed.

Preset targets are read-only and cannot be edited or deleted. You can also create custom targets via the settings page.

Configuration

Navigate to Settings > Network Probes to configure:

  • Target Management -- View all 96 preset targets (with lock icon and ISP label) and manage custom targets (create/edit/delete)
  • Global Settings -- Probe interval (30-600 seconds, default 60), packets per round (5-20, default 10), and default targets auto-assigned to new servers
  • Per-Server Targets -- Assign up to 20 probe targets per server via the network detail page's "Manage Targets" dialog

Network Overview Page

The /network page shows a card for each server with:

  • Target count and average latency
  • Availability percentage
  • Anomaly count with severity indicators

Network Detail Page

Click a server card to see /network/:serverId with:

  • Time range selector -- Real-time, 1h, 6h, 24h, 7d, 30d
  • Target cards -- Per-target latency and packet loss, with toggle visibility
  • Multi-line latency chart -- One colored line per target, tooltips with timestamps
  • Anomaly table -- High latency, high packet loss, and unreachable events
  • Statistics bar -- Average latency, availability percentage, target count
  • CSV export -- Download probe data for the selected time range

Data Retention

Network probe records follow the same two-tier storage as system metrics:

  • Raw records -- Retained for 7 days (configurable via retention.network_probe_days)
  • Hourly aggregates -- Retained for 90 days (configurable via retention.network_probe_hourly_days)

Alert Integration

Two alert rule types are available for network quality:

  • network_latency -- Triggers when average latency exceeds a threshold
  • network_packet_loss -- Triggers when packet loss exceeds a threshold

On this page