What I don't understand is how ChatGPT (or whatever) understands what I write. No matter how I phrase it, or how subtle or abstract the point or problem is, AI usually always figures out what I mean. I am mystified and constantly amazed at AI input processing.
What mechanism is at work here? If I want to deep dive on how AI understands meaning, what technology or concept do I need to research?
Furthermore, ChatGPT (and other LLMs) is trained on a massive corpus of text, books, articles, websites, etc. From that training, the model learns patterns in how words, phrases, and sentences related to one another. It doesn't explicitly understand what a "dog" or "love" means in the human sense, but it understands it patterns about how they are expressed and used in language.
Without going into too much details, it also uses other techniques like Probabilistic Modeling and Semantic Representations to essentially be able to provide you with what it does currently.
If you wish to dive deeper and do some research, I'd recommend checking out the following:
1. Transformer Architecture 2. Self-Attention Mechanism 3. Pre-trained language models 4. Embeddings and Semantic Space 5. Attention is All You Need - which is a paper published by Vaswani et al., very interesting publication that is a key for understanding the self-attention mechanism and how it powers modern NLP models like GPT. 6. Contextual Language Models
I think those 6 would cover up all your questions and doubts
For example, the vectors for "king" and "queen" will end up being really close together, while the vectors for "king" and "table" will be way far apart. Then, the transformer part kicks in with its self-attention mechanism. This is a fancy way of saying it analyzes how all the words in your text relate to each other and figures out how much "attention" to pay to each one. This is how the model gets the context. It's how it knows that the "bank" in "river bank" is totally different from the "bank" in "open a bank account" Based on all those relationships, it then predicts the next token. But it's not just guessing -it's making a highly probable prediction based on all the context it just looked at
To put it simply: the model isn't "aware" of what anything means. It's just incredibly good at modeling how meaning is expressed in language