Basically, if you're working with a database that has some actually useful/critical data, put an application layer in the middle that can gatekeep LLM access.
- Depending on your risk appetite, make a decision on providing a read-only URL to the thing that interfaces with the LLM.
- For maximum control over the queries, you can wrap then in functions (tools) and let the LLM only call the tools. But this takes away a lot of the power. What we've basically done is allowing the LLM to read the context progressively [2] and give it free reign over the queries.
- The said application layer can ask for approval on all of the queries. (Human in the loop). [3]
- (Optionally) If your query returns a lot of data, putting all of that through the LLM context might be expensive. You need a way to return data out of band from a query, directly to the user.
[1] https://github.com/inferablehq/inferable/tree/main/data-conn...
[2] https://github.com/inferablehq/inferable/blob/ec504b8593f41b...
[3] https://github.com/inferablehq/inferable/tree/main/data-conn...
- Create SQL credentials for LLM queries that will have limited access (exclude sensitive fields, tables, views etc) and read-only access.
- Use SQL views to avoid/exclude access to specific fields.
App Level:
- Use SQL linters and treat input as a regular user input.
- Convert the SQL query string into a query tree, escape ALL fields and then assemble the query back.