Building on the existing foundation of Software Heritage, the largest publicly available source code archive, CodeCommons aims to bring into one place all the critical and qualified information needed to create smaller, better datasets for the next generation of AI tools.
At its core, the project prioritizes transparency and traceability, enabling model builders and users to respect creators' rights while promoting sovereign and sustainable AI.
The power of AI is accessible to everyone and serves as a force for good
Developers and researchers have an unparalleled resource for building transparent, traceable, and ethical AI systems
Governments and organizations can ensure AI innovation aligns with principles of sustainability and sovereignty
CodeCommons isn't just a project; it's a movement towards an ethical, transparent, and accessible AI future. Together, we're laying the groundwork for the next generation of AI.
Join our community and help shape the future of AI: Sign up for our mailing list to stay informed and connected.
Universal archive of source code
Software engineering, code, programming, languages, managing software variability. Large-scale software evolution and generative AI for software development
Modeling and automatic linguistic analysis and computational humanities
Analysis and processing of complex data on a large scale
Engineering, software, and systems
DIASI - CEANatural language processing, generative AI
Machine learning, modeling, natural language processing, distributed computing
The global reference for license detection
Advanced expertise in massive data management
Data compression and text algorithms (ACM Paris Kanellakis award 2022)
Expertise in massively parallel HPC programming
Expertise in machine learning and text similarity
EuroHPC and expertise in efficient low-level distributed structures