## A simple counting problem 

* For most people, counting numbers is a simple operation composed of basically adding numbers. From an early age, most of us know how to compute 1 + 1, 2 + 2, and 2 - 1. It is an intuitive operation that is ubiquitous in our daily lives. As LLMs are basically developed to generate responses that are most likely human ones, one would infer that LLMs do not have a problem in computing the total price of multiple objects, calculating means and medians, and counting the number of letters in a single word... However, this last one might not be so easy for them to figure out. To better understand what I am saying, let's go through a sample question
* 
> Count the number that each letter appears in the word college.

You will probably answer something like **c = 1, o = 1, l = 2, e = 2, g = 1**...

And that's it! There is no interpretation, no magical formula, no advanced counting theorem. Just pure and simple math. However, how do we really perform this operation? 

### How do we do it?

Some people might deliberate a bit on these simple additions, especially people who might instantly respond to the questions. No matter if it is our automatic or analytical brain system (shoutout to Daniel Kahneman) that is performing the operation, one thing is for sure: it is a simple counting operation. 

We basically identify different characters visually based on previous knowledge of the alphabet and its characters and perform logical steps composed of adding numbers to check how many times the characters with the same visual representation are represented in this specific range of letters. This ability to perform simple operations (which are not too simple for a machine) comes from our brain's impressive capacity for interpreting, storing, and using visual data for other means. It is as simple as that: you look, interpret, and do whatever you want with this information without any additional step to evaluate such values. However, that is not how LLMs work. 

To perform any type of operation, LLMs rely on a process called tokenization. Tokenization is one of the fundamental steps in data analysis for LLMs in which they process, usually words, as a set of integers. For instance, while we read "dog" as just "dog," a machine interprets dogs as 001. This step is pivotal for generative AI, as it allows the machine to make associations of what words most often appear after the dog in a sentence and, thus, allows it to come up with a plausible answer. 


**But letters are a whole different story...**
You might be wondering, what if we want to analyze letters in a word? This tokenization process is then applied to each individual character in the designated word, which turns out to be troublesome for some specific LLMs. In the case of Copilot, no matter if it is a common word in terms of recurrence or an unusual word, it has difficulties in tokenizing it. Check it out below. 

### How does Copilot respond to the prompt?

**GPT3.5 Turbo**

<img src="anthropology.png" width="50%"/>

* As you can see, it has a lot of difficulties processing the word **anthropology**. 

**What about other words**

<img src="college.png" width="50%"/>

* This time, though, Copilot successfully counted the number of letters in the word **college**

**Is is the same thing for ChatGPT 4?**

<img src="gpt.png" width="50%"/>

* Unlike what you may be thinking, not actually. Supposedly, Copilot uses Open AI's GPT 4, too. However, when we insert the exact same prompt on ChatGPT 4, it **answers correctly**!

**What can we conclude with this experiment?**

I would say that the results obtained can show us two important things: LLMs are still, in general, not good at processing individual words, and ChatGPT 4 and Copilot are perhaps not as similar as we thought. 

Regarding the first assertion, doing this experience makes it really easy to visualize the limitations of performing tokenization in comparison to the way that the human brain interprets words and counting. Probably, for more unusual words, like **anthropology**, Copilot simply attributes the same tokens to different letters or computes syllables as single words. Meanwhile, for more usual words in the English language, like **college**, its word tokenization process is more optimized and thus does not generate any problem in terms of interpretation and computation. 


**WHAT ABOUT ChatGPT 4?**

Well, that is content for other articles, but at least in theory, Copilot should behave similarly to ChatGPT 4. Perhaps this indicates that, in fact, ChatGPT 4 and Copilot are not as similar as Microsoft proposed...


**Thank you for reading!**