Troubleshooting
This guide helps you diagnose and resolve common issues with TwinEdge Edge devices.
Quick Diagnostics
System Status Check
Run the diagnostic script:
twinedge diagnose
Or manually check:
# Check all services
docker compose ps
# View resource usage
docker stats
# Check disk space
df -h
# Check memory
free -h
Common Status Indicators
| Status | Meaning | Action |
|---|---|---|
| 🟢 Healthy | All systems operational | None needed |
| 🟡 Warning | Non-critical issue | Investigate soon |
| 🔴 Critical | System impaired | Immediate action |
Connection Issues
Cannot Connect to Device
Symptoms:
- Dashboard not loading
- SSH connection refused
- Ping timeouts
Solutions:
-
Verify network connectivity
# From another device on the network
ping DEVICE_IP -
Check if device is powered on
- Verify power LED
- Check power supply voltage
-
Verify IP address
# On the device
hostname -I
ip addr show -
Check firewall rules
sudo ufw status
sudo iptables -L
Cloud Connection Failed
Symptoms:
- "Disconnected" status in dashboard
- Data not syncing to cloud
- Remote access unavailable
Solutions:
-
Verify internet connectivity
ping 8.8.8.8
curl -I https://api.twinedgeai.com -
Check DNS resolution
nslookup api.twinedgeai.com -
Verify API credentials
cat /opt/twinedge/.env | grep API_KEY -
Check MQTT connection
docker compose logs mqtt-client -
Review firewall/proxy settings
- Port 443 (HTTPS) must be open outbound
- Port 8883 (MQTTS) must be open outbound
OPC UA Connection Issues
Symptoms:
- "Connection refused" errors
- "BadSecurityChecksFailed" errors
- Timeout errors
Solutions:
-
Verify server endpoint
# Test connectivity
nc -zv SERVER_IP 4840 -
Check security settings
- Match security mode/policy with server
- Verify certificates are installed
-
Test with UA Expert or another client
- Isolate whether issue is TwinEdge or server
-
Review server logs
docker compose logs opcua-server -
Certificate issues
# List trusted certificates
ls /opt/twinedge/certs/trusted/
# Import server certificate
cp server_cert.der /opt/twinedge/certs/trusted/
docker compose restart opcua-server
Modbus Connection Issues
Symptoms:
- No response from device
- Incorrect values
- Timeout errors
Solutions:
-
Verify network connectivity
ping MODBUS_DEVICE_IP
telnet MODBUS_DEVICE_IP 502 -
Check unit ID
- Verify slave address matches device configuration
- Try unit ID 1 (common default)
-
Test with mbpoll
# Install mbpoll
apt install mbpoll
# Read holding registers
mbpoll -a 1 -r 1 -c 10 DEVICE_IP -
Verify register addresses
- Check device documentation
- Note: Some devices use 0-based, others 1-based addressing
-
Check byte order
- Test different byte order settings
- Common: big_endian, little_endian
Data Issues
No Data Appearing
Symptoms:
- Dashboard shows no data
- Graphs are empty
- "No data" messages
Solutions:
-
Verify data source connection
# Check if data is being collected
docker compose logs opcua-server | tail -50 -
Check storage service
docker compose logs storage-service
# Query recent data
docker compose exec storage-service sqlite3 /data/telemetry.db \
"SELECT * FROM sensor_data ORDER BY timestamp DESC LIMIT 10" -
Verify tag subscription
- Ensure tags are correctly configured
- Check tag paths in data source settings
-
Check Node-RED flows
# Open Node-RED editor
http://DEVICE_IP:1880
# Check flow status
# Look for error indicators
Incorrect Data Values
Symptoms:
- Values don't match device display
- Random or nonsensical numbers
- Off by factor of 10/100/1000
Solutions:
-
Check data type configuration
- Verify int16 vs uint16 vs float32
- Match to device documentation
-
Verify byte order
# Common configurations
byte_order: big_endian # AB CD
byte_order: little_endian # CD AB
byte_order: mid_big_endian # BA DC
byte_order: mid_little_endian # DC BA -
Check scale factor
scale: 0.1 # Divide raw value by 10
scale: 10 # Multiply raw value by 10
offset: -40 # Subtract 40 from value -
Verify register address
- Off-by-one errors are common
- Check device documentation carefully
Data Delays
Symptoms:
- Dashboard shows old data
- Significant lag between device and display
- Updates come in batches
Solutions:
-
Check polling interval
polling_interval_ms: 1000 # Reduce for faster updates -
Check network latency
ping -c 10 OPC_UA_SERVER -
Review queue settings
docker compose logs -f node-red | grep queue -
Check storage service performance
docker stats storage-service
Service Issues
Service Won't Start
Symptoms:
- Container keeps restarting
- "Exit code 1" in docker ps
- Service unavailable
Solutions:
-
Check container logs
docker compose logs SERVICE_NAME
docker compose logs --tail 100 SERVICE_NAME -
Check for port conflicts
netstat -tulpn | grep PORT_NUMBER -
Verify configuration files
docker compose config # Check for YAML errors -
Check resource limits
docker stats
free -h -
Restart the service
docker compose restart SERVICE_NAME -
Recreate the container
docker compose up -d --force-recreate SERVICE_NAME
High CPU Usage
Symptoms:
- Device running hot
- Slow response times
- Fan running constantly
Solutions:
-
Identify culprit
docker stats
top
htop -
Check for runaway processes
docker compose logs SERVICE_NAME | grep -i error -
Reduce polling frequency
polling_interval_ms: 5000 # Increase from 1000 -
Disable unused services
docker compose stop ml-inference # If not using ML
High Memory Usage
Symptoms:
- Out of memory errors
- Services being killed
- System slowdown
Solutions:
-
Check memory usage
docker stats --no-stream
free -h -
Set memory limits
# docker-compose.yml
services:
ml-inference:
deploy:
resources:
limits:
memory: 512M -
Clear old data
# Clear old telemetry (keeps last 7 days)
docker compose exec storage-service python cleanup.py --days 7 -
Restart services to release memory
docker compose restart
Disk Space Issues
Symptoms:
- "No space left on device" errors
- Services failing to write
- Database errors
Solutions:
-
Check disk usage
df -h
du -sh /opt/twinedge/* -
Clear Docker resources
docker system prune -a
docker volume prune -
Clear old logs
docker compose logs --no-log-prefix SERVICE | tail -1000 > temp.log
truncate -s 0 /var/lib/docker/containers/*/CONTAINER_ID-json.log -
Reduce data retention
# Reduce from 30 to 7 days
retention_days: 7
Alert Issues
Alerts Not Triggering
Symptoms:
- Conditions met but no alert
- Alert history empty
- No notifications
Solutions:
-
Verify alert configuration
- Check condition thresholds
- Verify data source is correct
-
Check alert service
docker compose logs alert-service -
Verify data is flowing
- Check if tag has recent data
- Verify tag name matches exactly
-
Check notification channels
- Verify email/Slack configuration
- Test notifications manually
Too Many Alerts
Symptoms:
- Alert fatigue
- Constant notifications
- Flapping alerts
Solutions:
-
Adjust thresholds
- Increase threshold values
- Add hysteresis
-
Add duration requirement
duration_seconds: 60 # Must persist for 1 minute -
Use deadband
deadband: 5 # Ignore changes < 5% -
Implement alert grouping
- Group related alerts
- Rate limit notifications
ML Inference Issues
Model Not Loading
Symptoms:
- "Model not found" errors
- Inference returns errors
- No predictions
Solutions:
-
Check model file exists
ls -la /opt/twinedge/models/ -
Verify ONNX format
python -c "import onnx; onnx.checker.check_model('/opt/twinedge/models/model.onnx')" -
Check ML service logs
docker compose logs ml-inference -
Verify input shape
- Check model expects correct number of features
- Verify feature names match
Slow Inference
Symptoms:
- Predictions delayed
- High latency
- Timeouts
Solutions:
-
Check model size
ls -lh /opt/twinedge/models/ -
Use quantized model
- Use INT8 instead of FP32
- Reduces size and speeds inference
-
Reduce batch size
batch_size: 1 # Reduce from 10 -
Check CPU load
docker stats ml-inference
Getting Help
Collecting Diagnostics
Before contacting support, collect:
# Run full diagnostic
twinedge diagnose --full > diagnostics.txt
# Or manually
docker compose ps > docker_status.txt
docker compose logs > docker_logs.txt
dmesg > system_log.txt
Support Channels
- Documentation: docs.twinedgeai.com
- Community: discord.gg/twinedge
- Email: support@twinedgeai.com
- Enterprise: Dedicated support portal
Information to Include
- Device type and specs
- TwinEdge version
- Error messages (exact text)
- Steps to reproduce
- Diagnostic output
- Recent changes made
Next Steps
- Installation - Reinstall if needed
- Protocols - Protocol-specific troubleshooting
- OTA Updates - Update to fix known issues