Cedille, the largest French AI language model, is actually from Switzerland

French book

There’s a new artificial intelligence large language model (LLM) for the French language, it’s called Cedille, it’s the largest ever made, and it’s actually from Switzerland. Swiss company Coteries, working from the EPFL Innovation Park in Lausanne, built Cedille from GPT-J, an open-source model developed by Eleuther.ai.

Cedille has 6 billion parameters and was trained thanks to a donation of resources from Google TRC (TPU Research Cloud), the Google programme that allows researchers to access a cluster with more than a thousand TPUs (Tensor Processing Units, specific chips for AI operations) in the cloud. On top of that, Cedille is open source and can be freely downloaded from here.

This is very good news for the French-speaking ecosystem, because it will benefit from higher quality natural language processing (NLP) services. Not a small feat, considering that the size and complexity of large language models – commonly measured in parameters – are strongly correlated with the increase in the quality of results. In other words, the larger the LLM, the higher the quality of its work, the more use cases for artificial intelligence in language-intensive tasks.

In this respect, Cedille, with its six billion parameters, is by all accounts the largest French-language model, overtaking PAGnol, released in May this year, which stands at 1.5 billion. Before then, the largest French-language models were CamemBERT, with 110 335 million parameters, and FlauBERT with 138 million in its basic version and 373 million parameters in the extended version.

(if you are wondering why the usage of the all-caps BERT in some model names, it’s because they are based on Google’s BERT or Facebook’s RoBERTa LLMs)

However, none of these efforts can still match OpenAI’s GPT-3 and its 175 billion parameters, Google’s Switch Transformer with 1 trillion parameters or the Chinese Wu Dao 2.0 with 1.75 trillion parameters. These numbers are unattainable without multi-billion-dollar investments, which are unlikely to be forthcoming for ‘regional’ languages. Still, Cedille and its 6 billion parameters are putting the 7th most spoken language back into the fray. Chapeau!

I am a partner and founder of SNGLR Holding AG, a Swiss group specialising in exponential technologies with offices in Europe, USA and the UAE, where I supervise projects on applied artificial intelligence. I have an Artificial Intelligence Professional certificate from IBM and one on machine learning from Google Cloud. I am a member of several AI industry associations: AAAI, ACM (SIGAI), AIxIA. I participate to the European AI Alliance of the European Commission and I work with the European Defence Agency and the Joint Research Centre.