Linux Server Error Troubleshooting: PostgreSQL, Systemd, Permissions, Network
Introduction
This guide solves actual production errors with real error outputs. Copy-paste style troubleshooting for Linux sysadmins.
Error #1: PostgreSQL Connection Refused
The Error Output
$ psql -h localhost -U postgres
psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
Or from application logs:
django.db.utils.OperationalError: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
When This Happens
- After server reboot
- Fresh PostgreSQL installation
- Changing configuration files
- After system updates
- Firewall changes
What It Actually Means
Not an authentication problem. The connection can't even be established.
Real cause: PostgreSQL daemon (postgres) is either:
- Not running at all
- Running but not listening on expected port/interface
- Blocked by firewall
Step 1: Check if PostgreSQL is Running
sudo systemctl status postgresql
Example Output (Not Running):
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service)
Active: inactive (dead)
Example Output (Running):
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service)
Active: active (exited) since Mon 2025-03-10 10:00:00 UTC; 5min ago
If inactive (dead), go to Solution A.
If active but still connection refused, go to Step 2.
Step 2: Check What Port PostgreSQL is Listening On
sudo ss -lntp | grep postgres
Example Output (Listening on all interfaces):
LISTEN 0 128 0.0.0.0:5432 0.0.0.0:* users:(("postgres",pid=1234,fd=5))
LISTEN 0 128 [::]:5432 [::]:* users:(("postgres",pid=1234,fd=6))
Example Output (Listening only on localhost):
LISTEN 0 128 127.0.0.1:5432 0.0.0.0:* users:(("postgres",pid=1234,fd=5))
Example Output (Not listening at all):
(no output)
If no output, PostgreSQL is running but not listening properly. Check logs (Step 3).
Step 3: Check PostgreSQL Logs
# Ubuntu/Debian
sudo tail -50 /var/log/postgresql/postgresql-14-main.log
# CentOS/RHEL
sudo tail -50 /var/lib/pgsql/data/log/postgresql-*.log
# Via journalctl
sudo journalctl -u postgresql -n 50
Look for errors like:
FATAL: could not create lock file "/var/run/postgresql/.s.PGSQL.5432.lock": Permission denied
or
FATAL: data directory "/var/lib/postgresql/14/main" has wrong ownership
Solution A: PostgreSQL Not Running (Start It)
# Start service
sudo systemctl start postgresql
# Enable on boot
sudo systemctl enable postgresql
# Verify
sudo systemctl status postgresql
Solution B: PostgreSQL Listening on Wrong Interface
Problem: Config set to localhost only, but you're connecting from external IP.
Edit PostgreSQL config:
# Find config file
sudo -u postgres psql -c "SHOW config_file;"
# Output: /etc/postgresql/14/main/postgresql.conf
# Edit it
sudo nano /etc/postgresql/14/main/postgresql.conf
Find line:
listen_addresses = 'localhost'
Change to:
listen_addresses = '*'
Restart:
sudo systemctl restart postgresql
Solution C: Firewall Blocking
Check if firewall is running:
# Ubuntu/Debian (UFW)
sudo ufw status
# CentOS/RHEL (firewalld)
sudo firewall-cmd --list-all
Allow PostgreSQL port:
# UFW
sudo ufw allow 5432/tcp
# firewalld
sudo firewall-cmd --add-port=5432/tcp --permanent
sudo firewall-cmd --reload
Verification
# Test local connection
psql -h localhost -U postgres
# Test from remote machine
psql -h <server-ip> -U postgres
Common Mistakes
❌ Restarting before checking logs - You miss the root cause ❌ Assuming it's authentication - Connection refused ≠ auth failed ❌ Opening firewall without checking if service is running ❌ Editing wrong config file (PostgreSQL can have multiple versions)
Prevention
-
Enable service on boot:
sudo systemctl enable postgresql -
Monitor service status:
# Add to cron for alert systemctl is-active postgresql || echo "PostgreSQL DOWN" | mail -s "Alert" admin@example.com -
Check logs after config changes
Error #2: PostgreSQL Password Authentication Failed
The Error Output
$ psql -h localhost -U myuser -d mydb
Password for user myuser:
psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: password authentication failed for user "myuser"
Or in application logs:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: password authentication failed for user "django_user"
When This Happens
- After changing user passwords
- Fresh database setup
- Migrating from development to production
- After editing
pg_hba.conf
What It Actually Means
The connection succeeded, but authentication failed. Either:
- Wrong password
- User doesn't exist
pg_hba.confhas wrong authentication method- User exists but not in the right database
Step 1: Check if User Exists
sudo -u postgres psql -c "\du"
Example Output:
List of roles
Role name | Attributes
-----------+------------------------------------------------------------
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS
myuser |
If myuser is not in list, user doesn't exist. Go to Solution A.
Step 2: Check pg_hba.conf
# Find pg_hba.conf location
sudo -u postgres psql -c "SHOW hba_file;"
# View it
sudo cat /etc/postgresql/14/main/pg_hba.conf | grep -v "^#" | grep -v "^$"
Example Output:
local all postgres peer
local all all peer
host all all 127.0.0.1/32 md5
host all all ::1/128 md5
Key lines explained:
local ... peer= Local connections use system user authenticationhost ... md5= TCP/IP connections require password (encrypted)host ... trust= No password required (DANGEROUS!)
Step 3: Test Connection as postgres User
sudo -u postgres psql
If this works, the issue is with your specific user credentials.
Solution A: User Doesn't Exist (Create It)
# Connect as postgres
sudo -u postgres psql
# Create user with password
CREATE USER myuser WITH PASSWORD 'secure_password';
# Grant privileges to database
GRANT ALL PRIVILEGES ON DATABASE mydb TO myuser;
# Exit
\q
Solution B: Wrong Password (Reset It)
# Connect as postgres
sudo -u postgres psql
# Change password
ALTER USER myuser WITH PASSWORD 'new_secure_password';
# Exit
\q
Solution C: Wrong Authentication Method in pg_hba.conf
Problem: Config uses peer but you're connecting via TCP.
Edit pg_hba.conf:
sudo nano /etc/postgresql/14/main/pg_hba.conf
Change this:
local all myuser peer
To this:
local all myuser md5
Or to allow network connections:
host mydb myuser 192.168.1.0/24 md5
Reload config (don't need full restart):
sudo systemctl reload postgresql
Solution D: User Doesn't Have Access to Database
# Connect as postgres
sudo -u postgres psql
# Grant access
GRANT CONNECT ON DATABASE mydb TO myuser;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO myuser;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO myuser;
# Make it default for future tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO myuser;
\q
Verification
# Test connection
psql -h localhost -U myuser -d mydb
# Should connect without error
Common Mistakes
❌ Using wrong username (postgres vs myuser)
❌ Editing pg_hba.conf but not reloading (systemctl reload postgresql)
❌ Confusing local with host (local = Unix socket, host = TCP/IP)
❌ Granting privileges to wrong database
❌ Using trust in production (no password required = security risk)
Prevention
-
Use strong passwords:
# Generate random password openssl rand -base64 32 -
Restrict access by IP in pg_hba.conf:
host mydb myuser 10.0.1.0/24 md5 -
Use separate users for each app (don't share credentials)
-
Document credentials in password manager, not plain text files
Error #3: Systemd Service Failed to Start
The Error Output
$ sudo systemctl start myapp
Job for myapp.service failed because the control process exited with error code.
See "systemctl status myapp.service" and "journalctl -xe" for details.
$ sudo systemctl status myapp
● myapp.service - My Application
Loaded: loaded (/etc/systemd/system/myapp.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2025-03-10 10:00:00 UTC; 5s ago
Process: 12345 ExecStart=/usr/bin/node /opt/myapp/server.js (code=exited, status=1/FAILURE)
Main PID: 12345 (code=exited, status=1/FAILURE)
Mar 10 10:00:00 server systemd[1]: myapp.service: Main process exited, code=exited, status=1/FAILURE
Mar 10 10:00:00 server systemd[1]: myapp.service: Failed with result 'exit-code'.
Mar 10 10:00:00 server systemd[1]: Failed to start My Application.
When This Happens
- After creating new systemd service
- After editing unit file
- After system updates
- When dependencies fail
- Permission issues
What It Actually Means
Your service's main process started but immediately crashed (exit code 1 = general error).
Not a systemd problem - your application has a bug or misconfiguration.
Step 1: Check Detailed Logs
# Last 50 lines of service logs
sudo journalctl -u myapp -n 50 --no-pager
# Follow logs in real-time
sudo journalctl -u myapp -f
Example Output:
Mar 10 10:00:00 server node[12345]: Error: Cannot find module 'express'
Mar 10 10:00:00 server node[12345]: at Function.Module._resolveFilename (internal/modules/cjs/loader.js:880:15)
Mar 10 10:00:00 server systemd[1]: myapp.service: Main process exited, code=exited, status=1/FAILURE
Root cause: Missing dependencies.
Step 2: Check Unit File Syntax
# View unit file
sudo systemctl cat myapp
# Check for syntax errors
sudo systemd-analyze verify myapp.service
Step 3: Try Running Command Manually
If ExecStart=/usr/bin/node /opt/myapp/server.js, run it manually:
# Switch to service user first
sudo -u myappuser /usr/bin/node /opt/myapp/server.js
This shows you the actual error without systemd wrapping it.
Solution A: Missing Dependencies
For Node.js:
cd /opt/myapp
sudo -u myappuser npm install
For Python:
cd /opt/myapp
sudo -u myappuser pip3 install -r requirements.txt
Solution B: Permission Denied
Error in logs:
EACCES: permission denied, open '/opt/myapp/data/database.db'
Fix ownership:
sudo chown -R myappuser:myappuser /opt/myapp
Fix permissions:
# Directories: 755
sudo find /opt/myapp -type d -exec chmod 755 {} \;
# Files: 644
sudo find /opt/myapp -type f -exec chmod 644 {} \;
# Main executable: 755
sudo chmod 755 /opt/myapp/server.js
Solution C: Port Already in Use
Error in logs:
Error: listen EADDRINUSE: address already in use :::3000
Find what's using the port:
sudo ss -lntp | grep :3000
Output:
LISTEN 0 128 *:3000 *:* users:(("node",pid=8888,fd=10))
Kill it:
sudo kill 8888
# Or if it's an old instance of same service
sudo systemctl stop myapp
sudo systemctl start myapp
Solution D: Environment Variables Missing
Error in logs:
Error: DATABASE_URL is not defined
Add to unit file:
sudo nano /etc/systemd/system/myapp.service
Add under [Service]:
[Service]
Environment="DATABASE_URL=postgresql://user:pass@localhost/db"
Environment="NODE_ENV=production"
Or use EnvironmentFile:
[Service]
EnvironmentFile=/etc/myapp/env
Then create /etc/myapp/env:
DATABASE_URL=postgresql://user:pass@localhost/db
NODE_ENV=production
Reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart myapp
Solution E: Working Directory Wrong
Error:
Error: ENOENT: no such file or directory, open 'config.json'
Fix: Add WorkingDirectory to unit file:
[Service]
WorkingDirectory=/opt/myapp
ExecStart=/usr/bin/node server.js
Verification
# Check status
sudo systemctl status myapp
# Should show:
# Active: active (running)
# Test functionality
curl http://localhost:3000
Common Mistakes
❌ Not running daemon-reload after editing unit file
❌ Wrong user in User= directive (file permissions won't match)
❌ Absolute paths not used in ExecStart (use /usr/bin/node, not just node)
❌ Missing WorkingDirectory when app expects relative paths
❌ Not setting environment variables
Prevention
- Template unit file:
[Unit]
Description=My Application
After=network.target postgresql.service
[Service]
Type=simple
User=myappuser
WorkingDirectory=/opt/myapp
EnvironmentFile=/etc/myapp/env
ExecStart=/usr/bin/node server.js
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
-
Always test manually first:
sudo -u myappuser /usr/bin/node /opt/myapp/server.js -
Use
Restart=alwaysfor auto-recovery
Error #4: Permission Denied on File/Directory
The Error Output
$ cat /var/log/nginx/access.log
cat: /var/log/nginx/access.log: Permission denied
$ mkdir /opt/myapp/data
mkdir: cannot create directory '/opt/myapp/data': Permission denied
$ ./deploy.sh
bash: ./deploy.sh: Permission denied
When This Happens
- Accessing files owned by other users
- Creating files in restricted directories
- Running scripts without execute permission
- After changing ownership
What It Actually Means
Linux permission system (user/group/others, rwx) is blocking your action.
Step 1: Check File Permissions
ls -lah /var/log/nginx/access.log
Example Output:
-rw-r----- 1 www-data adm 12M Mar 10 10:00 /var/log/nginx/access.log
Breakdown:
-
-rw-r------= regular filerw-= owner (www-data) can read and writer--= group (adm) can read---= others have no access
-
Your user is not
www-dataand not inadmgroup, so you can't read it.
Step 2: Check Your User and Groups
whoami
# Output: deploy
groups
# Output: deploy
Solution A: Add User to Required Group
# Add your user to 'adm' group
sudo usermod -aG adm deploy
# Logout and login for changes to take effect
exit
# Re-login via SSH
ssh deploy@server
# Verify
groups
# Output: deploy adm
Solution B: Use sudo for Temporary Access
sudo cat /var/log/nginx/access.log
Warning: Don't use sudo for everything. Understand why permission is denied.
Solution C: Change File Ownership
Only if you should own the file:
sudo chown deploy:deploy /opt/myapp/data
Solution D: Fix Execute Permission on Script
# Check current permission
ls -l deploy.sh
# -rw-r--r-- 1 deploy deploy 1234 Mar 10 10:00 deploy.sh
# Add execute permission
chmod +x deploy.sh
# Verify
ls -l deploy.sh
# -rwxr-xr-x 1 deploy deploy 1234 Mar 10 10:00 deploy.sh
# Now run it
./deploy.sh
Verification
# Test access
cat /var/log/nginx/access.log
# Should work without error
Common Mistakes
❌ Using chmod 777 on everything (massive security risk)
❌ Running everything with sudo (masks permission issues)
❌ Changing ownership of system files (/var/log should stay root/www-data)
❌ Not logging out after usermod -aG (group changes need new session)
Prevention
- Use proper groups for shared access
- Set up ACLs for complex permissions
- Never
chmod 777in production - Document why specific permissions are set
Error #5: No Space Left on Device
The Error Output
$ echo "test" > file.txt
bash: file.txt: No space left on device
$ sudo apt update
E: Write error - write (28: No space left on device)
$ docker pull nginx
Error response from daemon: write /var/lib/docker/tmp/GetImageBlob123: no space left on device
Or in application logs:
IOError: [Errno 28] No space left on device: '/var/log/myapp/app.log'
When This Happens
- Logs filling up disk
- Docker images accumulating
- Database growing
- Temp files not cleaned
Step 1: Check Disk Usage
df -h
Example Output:
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 20G 20G 0 100% /
/dev/sdb1 100G 75G 25G 75% /data
Problem: Root filesystem is 100% full.
Step 2: Find Large Directories
cd /
sudo du -h --max-depth=1 | sort -hr | head -20
Example Output:
18G .
10G ./var
4G ./usr
2G ./home
1G ./opt
/var is the culprit. Dig deeper:
cd /var
sudo du -h --max-depth=1 | sort -hr | head -20
Output:
10G .
8G ./log
1.5G ./lib
500M ./cache
Step 3: Find Specific Large Files
sudo find /var/log -type f -size +100M -exec ls -lh {} \;
Example Output:
-rw-r----- 1 syslog adm 5.2G Mar 10 10:00 /var/log/syslog
-rw-r----- 1 www-data www-data 2.8G Mar 10 10:00 /var/log/nginx/access.log
Solution A: Clean System Logs
# Truncate large log files (don't delete!)
sudo truncate -s 0 /var/log/syslog
sudo truncate -s 0 /var/log/nginx/access.log
# Clean systemd journal
sudo journalctl --vacuum-time=7d
# Or by size
sudo journalctl --vacuum-size=500M
Solution B: Clean Package Manager Cache
# Ubuntu/Debian
sudo apt clean
sudo apt autoclean
sudo apt autoremove
# CentOS/RHEL
sudo yum clean all
Solution C: Clean Docker
# Remove stopped containers
docker container prune
# Remove unused images
docker image prune -a
# Remove everything unused
docker system prune -a --volumes
Solution D: Check for Deleted But Open Files
sudo lsof | grep deleted | grep -v '/tmp'
Example Output:
nginx 1234 www-data 3w REG 8,1 3221225472 deleted
Nginx has a 3GB deleted file still open!
Fix:
sudo systemctl restart nginx
Verification
df -h
Output should show free space:
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 20G 12G 8G 60% /
Common Mistakes
❌ Deleting /var/log/* entirely (breaks services expecting log files)
❌ Not checking for deleted open files
❌ Forgetting Docker uses space
❌ No log rotation configured
Prevention
- Set up log rotation:
# /etc/logrotate.d/myapp
/var/log/myapp/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0640 myapp myapp
}
- Monitor disk usage:
# Cron alert script
#!/bin/bash
USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $USAGE -gt 80 ]; then
echo "Disk usage: ${USAGE}%" | mail -s "Disk Alert" admin@example.com
fi
- Automate Docker cleanup:
# Weekly cron
0 3 * * 0 docker system prune -f --volumes
Conclusion
Pattern for all errors:
- Read the exact error message
- Check service status and logs
- Verify configuration files
- Test manually before using systemd
- Fix root cause, not symptoms
- Verify the fix works
- Prevent recurrence with monitoring
Never:
- Restart without checking logs
- Use
chmod 777orchown -R root - Skip verification step
- Assume cloud/container magic will fix it
Always:
- Read logs first
- Understand the error before Googling solutions
- Test in safe way before production
- Document the fix for next time
Pro Tip: Create a
/root/troubleshooting.mdfile with these commands. When under pressure at 3 AM, you'll thank yourself.