The rise of Large Language Models: from fundamentals to application


Video: LLM
Table of Contents
Complete document



Artificial Intelligence is the most profound technology that humanity is working on, more profound than fire, electricity or anything else we've done in the past. It gets to the essence of what intelligence is, what humanity is. It will certainly someday be far more capable than anything we've seen before”.  

This is Google CEO Sundar Pichai's view on the rise of artificial intelligence (AI), which not only highlights its depth and potential, but also positions AI as a milestone in the history of technological and human development.

Generative Artificial Intelligence (GenAI) and, within it, Large Language Models (LLM) are emerging as the most significant manifestations of this transformation.

It is important to note that this breakthrough is a logical consequence of the digital transformation process, driven by advances in data storage, processing, data availability, and new modeling techniques, without which this milestone would not have been possible.

GenAI refers to artificial intelligence systems capable of generating new and original content, be it text, images, video, voice, music, 3D models, or programming code. These systems learn from massive amounts of existing data and can produce outputs that, in many cases, are indistinguishable from those created by humans. This ability to create content opens up new possibilities in all areas of every industry, with implications that are still difficult to predict.

Specifically, GenAI is finding potentially revolutionary applications in areas such as education, where it can personalize and enhance learning; healthcare, where it can facilitate more accurate diagnoses and the development of individualized treatments; finance, where it can improve risk analysis and fraud detection; commerce, where it can optimize the supply chain and the customer experience; art, where it can open up new creative possibilities; and law, where it can streamline contract review and predict legal outcomes, to name just a few.

Within GenAI, LLMs (such as OpenAI ChatGPT, Anthropic Claude, Google Gemini, Meta Llama, Mistral or SenseTime SenseNova) represent a disruptive advance in natural language processing. These models are able to analyze and generate text with a level of coherence, relevance, and fluency previously unattainable by other algorithms. Their applications range from writing assistance and idea generation to automated translation, full report generation citing relevant articles and regulations, and the creation of more natural and effective conversational interfaces (”chatbots“).

GenAI, including LLMs, is influencing our interaction with technology and information, helping to transform content creation, data-driven decision making, and the way we interact with machines. Despite still being in its early stages, its full impact is yet to be determined. In this sense, it is already being used to create advanced virtual assistants, voice and gesture interfaces for controlling home devices, instant translation interfaces, and in integration with augmented reality and virtual reality technologies.

At the enterprise level, most large companies are already developing LLM-based systems to industrialize processes, including customer service, data analysis, reporting, and automation of repetitive tasks. According to a Microsoft study, integrating LLM as a co-pilot in office automation tools results in time savings ranging from 27% to 74% without compromising quality. In the case of SMBs, the level of adoption is still limited, creating an even greater risk of a technology gap for this segment.

When properly applied, LLMs have the potential to optimize processes, reduce time, and save costs. In addition, they can improve the objectivity and quality of documents, reduce errors, offer new ways of interacting with customers and, thanks to their ability to analyze massive amounts of information, provide access to previously unavailable knowledge due to processing and comprehension limitations. However, it is important to remember that successful optimization depends on factors such as data quality, learning complexity, and the appropriateness of the model to the problem at hand.

Going further, some experts see LLMs as a step toward the creation of Artificial General Intelligence (AGI), a medium-term goal in which AI could mimic a wide range of intellectual tasks that humans can perform. However, the concept of AGI remains vague and its feasibility is subject to significant cultural, political and legal constraints, such as ethics or privacy, which would require further specification and analysis. It is also crucial to recognize the inherent limitations of AI, which, according to philosophers of language such as John Searle and his "Chinese Room" experiment, lacks the capacity for abstraction and association of concepts to symbols, an attributes unique to the human mind.

According to several experts, AGI could be achieved between 2029 and 2035, or even sooner. While today's AI specializes in specific tasks ("narrow AI") and LLMs are beginning to exhibit general capabilities, AGI promises much broader versatility and adaptability. Although there is already specialist AI that outperforms 100% of humans (e.g., chess-playing AI), Google DeepMind estimates that the progress of AGI (e.g., LLMs) is currently at a level of only 1 out of 5; i.e., just in its infancy (Figure).

 

Performance (rows) x Generality (columns) Specialist
Limited and clearly defined task or set of tasks
General
Broad range of non-physical tasks, including metacognitive skills such as learning new skills
Level-0: No AI No-AI Specialist
Calculators, compilers
General No-AI
Computing human-in-the-loop, e.g., Amazon Mechanical Turk
Level-1: Emerging
Equal or somewhat better than an unskilled human
IA Emerging Specialist
GOF-4
Simple rule-based systems, e.g., SHRDLU
AGI Emergent
ChatGPT, Gemini, Claude, Llama
Level 2: Proficient
At least at the 50th percentile of qualified adults
AI Proficient Specialist
Toxicity detectors such as Jigsaw
Siri (Apple), Alexa (Amazon), Google Assistant (Google)
VoA systems such as PALI, Watson (IBM), LLMs SOTA (e.g., short essay writing, simple coding)
AGI competent
Not yet achieved
Level 3: Expert
At least at the 90th percentile of qualified adults
AI Expert Specialist
Spelling and grammar checkers such as Grammarly
Image generative models such as Image or Dall-E 2
AGI Expert
Not yet achieved
Level 4: Virtuous
At least in the 99th percentile of qualified adults
AI Virtuous specialist
Deep Blue: IBM-developed chess computer that defeated the world champion in 1997
AlphaGo: an AI developed by DeepMind that defeated world-class players in the board game Go
AGI Virtuous
Not yet achieved
Level 5: Superhuman
Overcomes 100% of humans
Superhuman Specialist AI
AlphaFold: predicts protein structures with high accuracy
AlphaZero: self-taught AI that masters games such as chess, Go and shogi
StockFish: a powerful open-source chess engine
Super Artificial Intelligence (SIA)
Not yet achieved

 

However, with these advances in GenAI and LLM come significant risks, ethical considerations and challenges, including data privacy and information security; difficulties in model interpretability; generation of false or misleading information (”hallucinations“); propagation of bias, discrimination and inappropriate or toxic content; challenges in AI regulation and governance; regulatory non-compliance with potential sanctions; intellectual property, copyright, authorship and plagiarism issues; high resource consumption and environmental impact; the ”Eliza Effect“, overconfidence and reduced critical capacity; ethical risks in automated decision making; risk of overreliance on AI for critical tasks; risks of using LLM for manipulation and misinformation; risk of human job replacement; need for job transition and training; and inequalities in access to and use of AI technologies, to name a few of the most important.

Specifically, LLMs can generate hallucinations, i.e., false or misleading information, which combined with the "Eliza Effect", where users attribute human cognitive abilities to these systems, can lead to overconfidence, dependency or, misinterpretation, and thus to wrong decisions.

In the face of these challenges, regulators are taking proactive steps at the national and international levels to address the risks and opportunities of AI. Of particular note is the Bletchley Declaration, signed by the European Union and 27 countries (including the United States, United Kingdom, China, India, Brazil, and Australia) in November 2023, which sets out a global commitment to the responsible development of AI. For its part, the European Union, with the imminent implementation of the Artificial Intelligence Act, is introducing the first comprehensive legally binding framework that classifies AI systems according to their risk and sets very strict standards, especially for high-risk systems. And in the United States, President Biden's Executive Order, issued on October 30, 2023, and the Blueprint for an Artificial Intelligence Bill of Rights set standards to ensure the safety, reliability and fairness of AI, with a focus on privacy, civil rights, consumer protection and international leadership in AI governance.

In this context, organizations are defining their AI strategy (with a special focus on GenAI and LLMs), designing their AI adoption plan, and adapting their structures, including the creation of GenAI centers of excellence and the incorporation of new figures such as the Chief AI Officer. Existing management frameworks (model risk, data protection, cybersecurity, etc.) are being adapted accordingly to address AI-specific challenges. This involves adjusting risk appetite, reviewing and updating policies and procedures, and conducting a thorough review of the technology stack and data; all of which entails a review of the entire lifecycle of AI systems, from design to deployment and maintenance, to ensure that they conform to ethical, security and compliance standards. 

This white paper examines the current LLM landscape and its future prospects. Through detailed analysis, case studies, and discussion of current trends and challenges, this paper covers key aspects of the context and definition of LLMs, their evolution, use in organizations, regulatory requirements, typologies, critical aspects of their development and architecture, and concludes with a framework for validating LLMs (including interpretability and bias and discrimination analysis) and a case study to illustrate its application.

 

Table of Contents


Go to Chapter 1

Introduction

Go to Chapter 2

Executive summary

Go to Chapter 3

LLM: definition, context and regulation

LLM: development and deployment


Go to Chapter 5

LLM: validation framework

Go to Chapter 6

Case study: validation of a policy chatbot

Go to CHAPTER 7

Conclusions

GO TO GLOSSAY & References

Glossary & References



The rise of Large Language Models:
from fundamentals to application

Access the full document