Defeated by NGINX

Selfhosted@lemmy.world – 14 points – 1 years ago

Heads up! Long post and lots of head bashing against the wall.

Context:

I have a written an a python app (Django). I have dockerized the deployment and the compose file has three containers, app, nginx and postgres. I'm currently trying to deploy a demo of it in a VPS running Debian 11. Information below has been redacted (IPs, Domain name, etc.)

Problem:

I keep running into 502 errors. Locally things work very well even with nginx (but running on 80). As I try to deploy this I'm trying to configure nginx the best I can and redirecting http traffic to https and ssl certs. The nginx logs simply say "connect() failed (111: Connection refused) while connecting to upstream, client: 1.2.3.4, server: demo.example.com, request: "GET / HTTP/1.1", upstream: "http://192.168.0.2:8020/", host: "demo.example.com"". I have tried just about everything.

What I've tried:

Adding my server block configs to /etc/nginx/conf.d/default.conf
Adding my server block configs to a new file in /etc/nginx/conf.d/app.conf and leaving default at out of box config.
Tried putting the above config (default.conf and app.conf) in sites-available (/etc/nginx/sites-available/* not at the same time tho).
Recreated /etc/nginx/nginx.conf by copy/pasting out of box nginx.conf and then adding server blocks directly in nginx.conf
Running nginx -t inside of the nginx container (Syntax and config were "successful")
Running nginx -T when recreated /etc/nginx/nginx.conf
- nginx -T when the server blocks where in /etc/nginx/conf.d/* lead me to think that since there were two server listen 80 blocks that I should ensure only one listen 80 block was being read by the container hence the recreated /etc/nginx/nginx.conf from above
Restarted container each time a change was made.
Changed the user block from nginx (no dice when using nginx as user) to www-data, root and nobody
Deleted my entire docker data and redeployed everything a few times.
Double checked the upstream block 1,000 times
Confirmed upstream block container is running and on the right exposed port Checked access.log and error.log but they were both empty (not sure why, tried cat and tail)
Probably forgetting more stuff (6 hours deep in the same error loop by now)

How can you help:

Please take a look at the nginx.conf config below and see if you guys can spot a problem, PLEASE! This is my current /etc/nginx/nginx.conf

user www-data;

worker_processes auto;

error_log /var/log/nginx/error.log notice; pid /var/run/nginx.pid;

events { worker_connections 1024; }

http { include /etc/nginx/mime.types; default_type application/octet-stream;

log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for"';

access_log  /var/log/nginx/access.log  main;

sendfile        on;
#tcp_nopush     on;

keepalive_timeout  65;

#gzip  on;

upstream djangoapp {
    server app:8020;
}

server {
    listen 80;
    listen [::]:80;
    server_name demo.example.com;

    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    listen [::]:443 ssl;

    server_name demo.example.com;

    ssl_certificate /etc/letsencrypt/live/demo.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/demo.example.com/privkey.pem;
    #ssl_protocols TLSv1.2 TLSv1.3;
    #ssl_prefer_server_ciphers on;

    location / {
        proxy_pass http://djangoapp;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        #proxy_set_header Upgrade $http_upgrade;
        #proxy_set_header Connection keep-alive;
        proxy_redirect off;
    }

    location /static/ {
        autoindex on;
        alias /static/;
    }
}

}

EDIT: I have also confirmed that both containers are connected to the same docker network (docker network inspect frontend)
EDIT 2: Solved my problem. See my comments to @chaospatterns. TLDR there was an uncaught exception in the app but it didn’t cause a crash with the container. Had to dig deep into logs to find it.

First the basics. Connection refused means that nothing is running on "http://192.168.0.2:8020/"

Is 192.168.0.2 the IP address of the Django container? If it's the host's IP, does docker ps show that the port is bound to the host? e.g. 0.0.0.0/8082->8082

Confirmed upstream block container is running and on the right exposed port

What steps did you do to confirm that this is running?

192.168.0.3 is the IP of the django app container (checked with docker inspect app | grep IP and docker logs nginx which shows blah blah upstream http://192.168.0.2:8020 blah blah)
I created a "frontend" network. This nginx and app container are both connected to this network but only nginx has the forwarding (0.0.0.0:80 and 0.0.0.0:443). The app container is set to EXPOSE 8020 in the Dockerfile and docker compose and the entrypoint.sh has this line after the usual django commands gunicorn app.wsgi:application --user www-data --bind 0.0.0.0:8020 --workers 3.

SOLVED.. ALMOST THERE??? There were no signs (docker logs app) of an issue, until I scrolled all the way to the very top (way past all the successful migrations, tasks being run upon boot and successful messages). There was an uncaught exception to boot gunicorn workers because of middleware I had removed from my dependencies a few days ago. Searched through my code, removed any calls and settings for this middleware package. Redeployed the app and now I can hit the public page.

What now? So now that it looks like everything is working. What is the best practice for the nginx.conf? Leave it all in /etc/nginx/nginx.conf (with user as root), reestablish the out box nginx.conf and /etc/nginx/conf.d/default.conf and just override the default.conf or add a secondary config like /etc/nginx/conf.d/app.conf and leave default.conf as configured out of box? What is the best practice around this?

There was an uncaught exception to boot gunicorn workers

That's odd that it didn't cause the Docker container to immediately exit.

What now? So now that it looks like everything is working. What is the best practice for the nginx.conf? Leave it all in /etc/nginx/nginx.conf (with user as root), reestablish the out box nginx.conf and /etc/nginx/conf.d/default.conf

My suggestion would be to create /etc/nginx/conf.d/mycooldjangoapp.conf. Compared to conf.d/default.conf, this is more intuitive if you start hosting multiple apps. Keep it out of the nginx.conf because apt-get or other package managers will usually patch that with new version changes and again it gets confusing if you have multiple apps.

Yea not sure why it didn’t just crash and hid behind all kinds of successful messages.

Fair enough! If I create a secondary config as you are suggesting, wouldn’t it create a conflict with the server blocks of default.conf? If I remember correctly, default.conf has a server listen 80 block going to localhost (which in my case wouldn’t be the correct path since the app is in another container) so wouldn’t nginx get confused because it doesn’t know which block to follow???

Or maybe I saw the block in default.conf but it was all commented it out out of the box. Idk I had to step away for a sec. As you can imagine I’ve been bashing my head for hours and it turned out to be some bs I should have probably read the entire log stream. So I’m pretty angry/decompressing at the moment.

If I create a secondary config as you are suggesting, wouldn’t it create a conflict with the server blocks of default.conf

No, you can have multiple server blocks with the same listen directive. They just need to differ by their server_name and only one server block can contain default_server; Reference

NGINX will use the server_name directives to differentiate the different backend services. This is a class virtual host configuration model.

Alright I’ll give it a try and see what happens. Thanks for your help!

Assume nothing! Test every little assumption and you'll find the problem. Some things to get you started:

Does the "app" domain resolve to the app container's IP from within the nginx container?
Can you proxy_pass to the host:port directly rather than using an upstream definition? If not, what about IP:port?
Can you connect to the app container from outside (if exposed)? What about from inside the nginx container? What about inside the app container?
Is the http(s) connection to the server (demo.example.com) actually going to your nginx instance? Shut it down and see if it changes.
If it works locally on 80, can you get it to work on the VPS on 80?
Are you using the exact same docker-compose.yaml file for this as locally? If not, what's different?
Are you building the image? If so, are you incrementing the version number of the build so it gets updated?
Is there a firewall running on the host OS? If so, is it somehow interfering? Disable it and see.

While not a direct solution to your problem, I no longer manually configure my reverse proxies at all now and use auto-configuring ones instead. The nginx-proxy image is great, along with it's ACME companion image for automatic SSL cert generation with certbot - you'll be up and running in under 30 mins. I used that for a long time and it was great.

I've since moved to using Traefik as it's more flexible and offers more features, but it's a bit more involved to configure (simple, but the additional flexibility means everything requires more config).

That way you just bring up your container and the reverse proxy pulls meta-data from it (e.g. host to map/certbot email) and off it goes.

Great pointers! Some of them I had done and triple checked and other are great future troubleshooting points. There was no way I was going to put hours of troubleshooting and checking on a post so I tried to provide as much information as possible without putting up a giant wall of text.

Glad you sorted it though! It's a nightmare when you get such an opaque error and there's so many moving parts that could be responsible!

The one time that it wasn’t ….. DNS hahah

Make sure the docker containers are using the same network. If you didn't specify something, this should be the case for all three containers in the compose file.

Alright, now give every container a name of your choosing using the container_name field.

Lastly, change the nginx config to refer to the app container by name, but I think you already did that: upstream djangoapp { server container-name:port }

No need to expose any ports except the 80 or 443 of the nginx container.

If you have issues, spin up a temporary alpine container with a command like "tail -f /dev/null", and use it with "docker exec -it temp /bin/bash" to install debugging stuff to debug the connection (nc/netcat, curl, ...).

Yea I always try to dedicate networks to each app and if it’s a full stack app then one for front end (nginx and app) and another for backend (app and database).

I didn’t think about spinning up the alpine container to troubleshoot so that’s another great pointer for future soul crushing and head bashing sessions!

While not directly solving the problem, my usual setup involves installing webmin with nginx and managing my domains and nginx from there, changing the auto-generated domain configs to proxy to docker. I have yet to find an easier solution.
My biggest gripe with it is that it sets up a few things I don't need, like PHP, disk quotas, etc.