Building on the existing foundation of Software Heritage, the largest publicly available source code archive, CodeCommons aims to bring into one place all the critical and qualified information needed to create smaller, better datasets for the next generation of AI tools.
At its core, the project prioritizes transparency and traceability, enabling model builders and users to respect creators' rights while promoting sovereign and sustainable AI.
The power of AI is accessible to everyone and serves as a force for good
Developers and researchers have an unparalleled resource for building transparent, traceable, and ethical AI systems
Governments and organizations can ensure AI innovation aligns with principles of sustainability and sovereignty
CodeCommons isn't just a project; it's a movement towards an ethical, transparent, and accessible AI future. Together, we're laying the groundwork for the next generation of AI.
Join our community and help shape the future of AI: Sign up for our mailing list to stay informed and connected.
Software Heritage - INRIA Universal archive of source code
DiverSE - INRIA Software engineering, code, programming, languages, managing software variability. Large-scale software evolution and generative AI for software development
ALManaCH - INRIA Modeling and automatic linguistic analysis and computational humanities
CEDAR - INRIA Analysis and processing of complex data on a large scale
DILS - CEA Engineering, software, and systems
DIASI - CEANatural language processing, generative AI
Tweag - Modus Create Machine learning, modeling, natural language processing, distributed computing
AboutCode The global reference for license detection
Emérite Inria - Patrick Valduriez Advanced expertise in massive data management
Sant'Anna Data compression and text algorithms (ACM Paris Kanellakis award 2022)
University of Pisa Expertise in massively parallel HPC programming
University of Bologna Expertise in machine learning and text similarity
University of Turin EuroHPC and expertise in efficient low-level distributed structures