PRQL (Pipelined Relational Query Language)
prql-lang.org
I found this link on Hacker News, and it seems to be a transpiler to generate SQL from a new language.
It's been a long time since I have written SQL, but I'm sure this could be interesting since SQL can be infuriating for most developers I've worked with.
You are viewing a single comment
Why would you need to know the eccentricities of SQL? Shouldn't it be enough to just know PRQL? The generated SQL should have the same semantics as the PRQL source, unless the transpiler is buggy.
Because at the end of the day, SQL it what's being run by the database. For example, in the Showcase on the front page, they have an "Orthogonality" example that demonstrates filtering both before and after an aggregation, which compiles to a WHERE clause and a HAVING clause respectively. WHERE and HAVING have very different impacts on SQL queries, and vastly different performance implications, but the simplification in PRQL obscures that complexity.
At the end of the day, the transpiled langauge will have to either support only a subset of SQL's features, or else be at least as complex as SQL. It cannot support all of SQL's features and yet be less complex, because it is just a wrapper around SQL.
I suppose for the right crowd, possibly people who run queries only once and do not care about performance implications, data integrity, etc., this could be a really useful tool. And in all fairness, they mention exactly that on their homepage:
"PRQL’s focus is analytical queries
PRQL was originally designed to serve the growing need of writing analytical queries, emphasizing data transformations, development speed, and readability. We de-emphasize other SQL features such as inserting data or transactions."
But for developers who need to maintain an application database, I don't foresee this becoming a useful substitute for SQL.
What are the implications of WHERE vs HAVING? I thought the only primary difference was that one happens before the aggregation and the other happens after, and all the other implications stem from that fact. PRQL's simplification, rather than obscuring, seems like a more clear and reasonable way to express that distinction.
I don't know if PRQL supports all SQL features, but I think it could while being less complex than SQL by removing arbitrary SQL complications like different keywords for WHERE vs HAVING, only being able to use column aliases in certain places, needing to recompute a transformation to use it in multiple clauses, not forcing queries to be in SELECT... FROM... WHERE... order, etc.
This is correct. The biggest implication of that difference is that, when you filter rows via a HAVING clause, the query will first select all the rows and aggregate them, and only then begin to filter them. That can be a massive performance hit if you thought that the filter would prevent filtered rows from ever being selected. Of course this makes perfect sense, there's no logical way to filter an aggregate without first aggregating, but it's not obvious.
My main point is that PRQL makes no distinction. If you didn't inspect that SQL output and already know about the difference between WHERE and HAVING, you would have no idea, because in PRQL they're both just "filter". Hence, PRQL is not simplifying the complexity (you still need to learn the full SQL syntax and the specifics of how it works), but it does obscure (you have no hints that one of your filter statements will behave completely differently from the other).
As far as removing arbitrary SQL features, I agree that that is it's main advantage. However, I think either the developers or else the users of PRQL will discover that far fewer of SQL's complexities are arbitrary than you might first assume.
Hmm, I have to disagree here. PRQL has no distinction in keyword, but it does have a distinction in where the filter goes relative to the aggregation. Given that the literal distinction being made is whether the filter happens before or after the aggregation, PRQL's position-based distinction seems a lot clearer than SQL's keyword-based distinction. Instead seeing two different keywords, remembering that one happens before the aggregation and the other after, then deducing the performance impacts from that, you just immediately see that one comes before the aggregation and the other after then deduce the performance impacts.
That's fair, I was just thinking of things that frustrate me with SQL, but I admittedly haven't thought too hard about why things are that way.
Because when you divide by zero and get a runtime error, the error will point you to location in SQL, not PRQL.
It's like if an error in a C++ program would point you to an offset in a binary and not the location in the source. This has a slight tone of sarcasm, because that's how compiled languages used to work. But after the years, they patched all leaks of their abstraction and now you are dealing just with the new language.