[CROSS POST] How To Harness the Power of Digital Technologies for Non-Profits

The Ersilia Open Source Initiative (EOSI) is a non-profit organization with the mission to strengthen research capacity against infectious and neglected diseases. To this end, we leverage the power of Artificial Intelligence and Machine Learning (AI/ML) methods to improve the drug discovery pipelines. Essentially, by testing more molecules in less time, these technologies can lower the barrier to the development of new medicines, and they represent a cost-effective solution for research in low-resourced settings.

The cost of bringing a new drug into the market is estimated to be over 1.5 billion USD, a prohibitive figure for diseases with expected low return on investment. This is translated into a research bias towards certain therapeutic areas, with over 85% of the drugs in development targeting non communicable diseases. Meanwhile, communicable or infectious diseases are still accountable for over 50% of the disease burden in Low and Middle Income Countries (LMIC). The best avenue to improve research efforts in this area is to empower researchers in regions where these diseases are endemic, as they are also the best placed to answer the specific needs of their countries. AI/ML tools, therefore, hold the promise to revolutionize drug discovery, but the need for advanced computer skills and lack of unified frameworks for dissemination are limiting their implementation in day-to-day experiments. Our goal is to provide an unprecedented catalogue of ready-to-use AI/ML models via the Ersilia Model Hub, a free, open-source online platform devised for non-expert scientists. These models can be used to reveal novel therapeutic applications of existing drugs (a.k.a. drug repurposing), natural products identified in herbal remedies, or any other compounds discontinued by pharmaceutical companies due to marketing reasons or in a patent-free status.

Early in our journey, we identified The Principles for Digital Development and adopted its core tenets as a roadmap to make computer-based science more reusable, closer to experimental scientists and built to bridge the gap in research inequalities.

  1. Design with the user: the Hub is entirely designed with its end-users. The Ersilia team includes non-computer scientists with many years of experience in Academia, which gives the developers valuable insight to adapt the tool to address their needs. We are creating the tool we wanted to have during our time as PhD students and postdoctoral researchers.
  2. Understand the ecosystem: each tool has to understand the particular traits of the region it will be deployed. In our particular case, some of the issues we face are low internet bandwidth (so large online queries are not feasible) access to subscription journals (how do we use data from non-open articles if our users cannot access it), or use of different operating systems (the majority of our collaborators use Windows, whereas we are Unix-based). To us, all these questions are crucial, and we devote the same time (if not more) to them as to platform development itself.
  3. Design for scale: move beyond the pilot phase is our major challenge today. The platform is ready, we have a few models in it, and we are ready to release it to the world. Our major fears are to not answer the real-world needs and to not generate enough traction, which is why we are attracting as many early-users and early-contributors as possible to re-adapt and continue growing in a sustainable manner.
  4. Build for sustainability: there are a number of similar resources and databases with great potential developed from academic research groups, that are released and published once but not properly maintained afterwards. In contrast, the Ersilia Model Hub is the main asset of EOSI’s charity, which ensures its long term maintenance. We are now exploring and understanding the viability of our economic model, as we wish to keep the Hub free for everyone.
  5. Be data driven: a successful outcome is not only measured by the scientific output (i.e research publication), but by the changes it promotes based on its research. A data-driven decision making approach is based on comprehensive reviews and quality information. Towards this end, we are establishing an innovative model of research collaborations, putting into the forefront capacity training and project management. We share these outcomes with all stakeholders and participants, with the goal to inform decision making and create a long-lasting effect. We are particularly fond of our visuals, as we are convinced that beautifully represented data is more impactful than text.
  6. Use open standards, open data, open source and open innovation: open science is the driving force of our initiative. Free sharing of our code (open source) and the data used to train our models (open data) increases the reliability of our AI/ML models, facilitating its acceptance by the scientific audience. Moreover, it creates a positive feedback loop, where the software itself is improved thanks to an ever growing community of open-source contributors. You can have a look at our GitHub repository. In some cases, we use private datasets to train specific models. These data points cannot be disclosed, but AI/ML gives us a way to take advantage of them while maintaining their privacy (scientists can now access new predictive models based on this data, that otherwise would have remained shelved in laboratories and pharma companies).
  7. Reuse and improve: AI/ML provides a great avenue to take advantage of already produced scientific data. Basically, we collect millions of data points corresponding to experimental results and use them to train our models. This allows to effectively transform published data into useful predictors for future experiments. In addition, we curate the scientific literature to identify AI/ML models of relevance and incorporate them in our Hub, deploying them within our user-friendly environment. Original authors are always acknowledged.
  8. Address privacy and security: we follow the strictest data protection protocols and do not have access to any personal or sensitive information, such as not-anonymized patient data.
  9. Be collaborative: collaboration is the central pillar of our organization. To engage scientists and the general public, obtain feedback, increase our visibility, and, in summary, create a community, we are putting an extra effort in social mediablog posts and day-to-day communication channels.

We are convinced that adhering to these guidelines is the only way towards our vision of a world with egalitarian research capacity.

Gemma Turon

Project Manager at Ersilia Open Source Initiative