How do you analyze conversations with AI agents in your products?

Question

Question to devs who have chat interfaces in their products. Do you monitor what your users are asking for? How do you do it?Yesterday, a friend asked me this question; he would like to know things like "What users ask that my agent can't accomplish?", "What users hate?", "What do they love?".A quick insight from another small startup - they are quite small so they just copied all the conversations from their database and asked ChatGPT to analyze them. They found out that the most requested missing feature was being able to use URLs in messages.I also found an attempt to build a product around this but it looks like the project has been abandoned: https://web.archive.org/web/20240307011502/https://simplyanalyze.ai/If there's indeed no solution to this and there are more people other than my friends who want this, I'd be happy to build an open-source tool for this.

kypro · Accepted Answer

I'm sure there are better approaches, but we do this at our company so if it's useful this is what we do:
We have config defining the various topics of interests – so in your case that would be "things users hate" and "things users love" - then as a nightly job we'll get an LLM to extract values based on those topics for any active conversations that occurred during the previous 24 hours.
When that job finishes we have a list of conversation ids paired with the raw extracted values from conversations.
We then run a second job to analyse what we've extracted, asking the LLM to dedupe, categorise, tag and count the extracted values. So "I hate your website" and "your site sucks" might become "doesn't like website" with a category of "website feedback".
Then when clients view their analysis we show them those deduped values for each topic of interest, ordered by the the total count. We also maintain links back to the conversation so they can dig deeper to understand the context. They can filter on tags and categories.
Like I say, probably better approaches, but this works well enough for us.