Long-acting injectables are considered one of the most promising therapeutic strategies for the treatment of chronic diseases as they can afford improved therapeutic efficacy, safety, and patient compliance. The use of polymer materials in such a drug formulation strategy can offer unparalleled diversity owing to the ability to synthesize materials with a wide range of properties. However, the interplay between multiple parameters, including the physicochemical properties of the drug and polymer, make it very difficult to intuitively predict the performance of these systems. This necessitates the development and characterization of a wide array of formulation candidates through extensive and time-consuming in vitro experimentation. Machine learning is enabling leap-step advances in a number of fields including drug discovery and materials science. Our study takes a critical step towards data-driven drug formulation development with an emphasis on long-acting injectables. A series of machine learning algorithms were trained and refined for accurate prediction of experimental drug release profiles using this dataset.
The dataset was constructed from previously published studies by our research group and other research groups.The studies performed by our group include spherical and cylinder shaped polymeric LAIs. Data from external sources was identified using the Web of Science search engine and the keyword combination “polymeric microparticle” and “drug delivery”. Information related to the preparation, final composition, and release kinetics of drug from LAIs was collected. The latter was primarily extracted from figures of in vitro drug release profiles using the “GetData Graph Digitizer” application. The final dataset contained 181 drug release profiles for 43 unique drug-polymer combinations. In total this comprised 3783 individual fractional release measurements. The initially collected dataset was composed of a table of drug and polymer names, as well as physicochemical properties of the formulation, and fractional drug release values at various timepoints. In order to use this data to construct and train ML models it is necessary to describe various elements using machine-readable descriptors which were generated using RDkit. The polymers and LAI formulations were described exclusively using information reported in the relevant published articles, these included; polymer molecular weight (Polymer_MW), lactide-to-glycolide ratio (LA/GA; for non-P...