Something somewhere is running an htmlspecialchars() or equivalent on whatever you input, probably as an attempt at "sanitizing" the text entered in titles/posts/comments. You know, to keep me from just inserting a javascript tag with src='http://pwned.ru/fu.js' into a comment and have it to something naughty to anyone who loads the page.
I'm certain these are being stored in the database as an & amp;, but they're not being decoded back into an ampersand character upon display.
I know, just sanitize it again. .Replace(β&β, &), Regex.Remove(amp;), if(.Contains(βamp;β))
The same with < and < please
No. This is just escaped html. So you can just unescape it like every other html.
Please be kidding lol
would it not be possible for whatever's decoding it to run arbitrary Javascript if done wrong? maybe that's why it doesn't exist yet?
The decode really, really, really should not be happening client side in Javascript. The backend should handle it before handing the text to the user's browser. You are correct; If this is done client side it means a bad actor can mess with it and/or include an injection attack of some sort.
Nothing client side should ever handle user input, except perhaps convenience features like flagging incomplete fields or kicking the cursor to the next input element when one is full (e.g. for phone numbers). Anything client side can be fucked with by the client. Validation needs to happen on the server side, before committing the input to the database (or doing whatever it's going to do with it).
There are a lot of potential pitfalls any time you accept text input from a user, store it, and regurgitate it back to display on a user's browser. The thing is, HTML (and all HTML-encapsulated scripting languages) are just text. So regular words and a block of Javascript that makes dancing polka-dotted hippos dance across your screen and incessantly play the Hamster Dance song at 200% volume are, without protections, input and stored exactly the same way. Preventing ne'er-do-wells from doing injection attacks with SQL calls, HTML, control and escape characters, Javascript, etc. is part of a whole industry.
It appears lemmy does filter out raw HTML tags, at least. I tried to insert one in my last comment just for illustration and it was silently removed from the input.
I can't use <3 in a post title without it getting mangled.
That's because the sanitization here is shit, but I bet you'd rather have that than be attacked by stored cross-site scripting attacks :)
Something somewhere is running an
htmlspecialchars()
or equivalent on whatever you input, probably as an attempt at "sanitizing" the text entered in titles/posts/comments. You know, to keep me from just inserting a javascript tag with src='http://pwned.ru/fu.js' into a comment and have it to something naughty to anyone who loads the page.I'm certain these are being stored in the database as an & amp;, but they're not being decoded back into an ampersand character upon display.
I know, just sanitize it again. .Replace(β&β, &), Regex.Remove(amp;), if(.Contains(βamp;β))
The same with < and
<
pleaseNo. This is just escaped html. So you can just unescape it like every other html.
Please be kidding lol
would it not be possible for whatever's decoding it to run arbitrary Javascript if done wrong? maybe that's why it doesn't exist yet?
The decode really, really, really should not be happening client side in Javascript. The backend should handle it before handing the text to the user's browser. You are correct; If this is done client side it means a bad actor can mess with it and/or include an injection attack of some sort.
Nothing client side should ever handle user input, except perhaps convenience features like flagging incomplete fields or kicking the cursor to the next input element when one is full (e.g. for phone numbers). Anything client side can be fucked with by the client. Validation needs to happen on the server side, before committing the input to the database (or doing whatever it's going to do with it).
There are a lot of potential pitfalls any time you accept text input from a user, store it, and regurgitate it back to display on a user's browser. The thing is, HTML (and all HTML-encapsulated scripting languages) are just text. So regular words and a block of Javascript that makes dancing polka-dotted hippos dance across your screen and incessantly play the Hamster Dance song at 200% volume are, without protections, input and stored exactly the same way. Preventing ne'er-do-wells from doing injection attacks with SQL calls, HTML, control and escape characters, Javascript, etc. is part of a whole industry.
It appears lemmy does filter out raw HTML tags, at least. I tried to insert one in my last comment just for illustration and it was silently removed from the input.
I can't use <3 in a post title without it getting mangled.
That's because the sanitization here is shit, but I bet you'd rather have that than be attacked by stored cross-site scripting attacks :)
Theres a git issue on this