We gratefully acknowledge support from
the Simons Foundation and member institutions.

Quantitative Biology

New submissions

[ total of 21 entries: 1-21 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 8 May 24

[1]  arXiv:2405.03707 [pdf, ps, other]
Title: Quantifying indirect and direct vaccination effects arising in the SIR model
Subjects: Populations and Evolution (q-bio.PE)

Vaccination campaigns have both direct and indirect effects that act to control an infectious disease as it spreads through a population. Indirect effects arise when vaccinated individuals block disease transmission in any infection chains they are part of, and this in turn can benefit both vaccinated and unvaccinated individuals. Indirect effects are difficult to quantify in practice, but here, working with the Susceptible-Infected-Recovered (SIR) model, they are analytically calculated in important cases, through pivoting on the Final Size formula for epidemics. Their relationship to herd immunity is also clarified. Furthermore, we identify the important distinction between quantifying indirect effects of vaccination at the "population level" versus the "per capita" individual level, which often results in radically different conclusions. As an important example, the analysis unpacks why population-level indirect effect can appear significantly larger than its per capita analogue. In addition, we consider a recently proposed epidemiological non-pharamaceutical intervention used over COVID-19, referred to as "shielding", and study its impact in our mathematical analysis. The shielding scheme is extended by inclusion of limited vaccination.

[2]  arXiv:2405.03726 [pdf, ps, other]
Title: sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures
Comments: ICLR 2024, Machine Learning for Genomics Explorations Workshop
Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG)

Influenced by breakthroughs in LLMs, single-cell foundation models are emerging. While these models show successful performance in cell type clustering, phenotype classification, and gene perturbation response prediction, it remains to be seen if a simpler model could achieve comparable or better results, especially with limited data. This is important, as the quantity and quality of single-cell data typically fall short of the standards in textual data used for training LLMs. Single-cell sequencing often suffers from technical artifacts, dropout events, and batch effects. These challenges are compounded in a weakly supervised setting, where the labels of cell states can be noisy, further complicating the analysis. To tackle these challenges, we present sc-OTGM, streamlined with less than 500K parameters, making it approximately 100x more compact than the foundation models, offering an efficient alternative. sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated from a combination of the finite multivariate Gaussian distributions. The core function of sc-OTGM is to create a probabilistic latent space utilizing a GMM as its prior distribution and distinguish between distinct cell populations by learning their respective marginal PDFs. It uses a Hit-and-Run Markov chain sampler to determine the OT plan across these PDFs within the GMM framework. We evaluated our model against a CRISPR-mediated perturbation dataset, called CROP-seq, consisting of 57 one-gene perturbations. Our results demonstrate that sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification through a recommender system. It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.

[3]  arXiv:2405.03829 [pdf, other]
Title: Unsupervised Machine Learning Identifies Latent Ultradian States in Multi-Modal Wearable Sensor Signals
Subjects: Neurons and Cognition (q-bio.NC)

Wearable sensors such as smartwatches have become ubiquitous in recent years, allowing the easy and continual measurement of physiological parameters such as heart rate, physical activity, body temperature, and blood glucose in an every-day setting. This multi-modal data offers the potential to identify latent states occurring across physiological measures, which may represent important bio-behavioural states that could not be observed in any single measure. Here we present an approach, utilising a hidden semi-Markov model, to identify such states in data collected using a smartwatch, electrocardiogram, and blood glucose monitor, over two weeks from a sample of 9 participants. We found 26 latent ultradian states across the sample, with many occurring at particular times of day. Here we describe some of these, as well as their association with subjective mood and time use diaries. These methods provide a novel avenue for developing insights into the physiology of everyday life.

[4]  arXiv:2405.03861 [pdf, other]
Title: Homeostasis in Input-Output Networks: Structure, Classification and Applications
Comments: 45 pages, 26 figures, submitted to the MBS special issue "Dynamical Systems in Life Sciences"
Subjects: Molecular Networks (q-bio.MN); Combinatorics (math.CO); Dynamical Systems (math.DS); Biological Physics (physics.bio-ph)

Homeostasis is concerned with regulatory mechanisms, present in biological systems, where some specific variable is kept close to a set value as some external disturbance affects the system. Mathematically, the notion of homeostasis can be formalized in terms of an input-output function that maps the parameter representing the external disturbance to the output variable that must be kept within a fairly narrow range. This observation inspired the introduction of the notion of infinitesimal homeostasis, namely, the derivative of the input-output function is zero at an isolated point. This point of view allows for the application of methods from singularity theory to characterize infinitesimal homeostasis points (i.e. critical points of the input-output function). In this paper we review the infinitesimal approach to the study of homeostasis in input-output networks. An input-output network is a network with two distinguished nodes `input' and `output', and the dynamics of the network determines the corresponding input-output function of the system. This class of dynamical systems provides an appropriate framework to study homeostasis and several important biological systems can be formulated in this context. Moreover, this approach, coupled to graph-theoretic ideas from combinatorial matrix theory, provides a systematic way for classifying different types of homeostasis (homeostatic mechanisms) in input-output networks, in terms of the network topology. In turn, this leads to new mathematical concepts, such as, homeostasis subnetworks, homeostasis patterns, homeostasis mode interaction. We illustrate the usefulness of this theory with several biological examples: biochemical networks, chemical reaction networks (CRN), gene regulatory networks (GRN), Intracellular metal ion regulation and so on.

[5]  arXiv:2405.03913 [pdf, other]
Title: Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process
Comments: 12 pages, 5 figures
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Machine Learning (stat.ML)

Biomanufacturing innovation relies on an efficient design of experiments (DoE) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach that can guide a sequential DoEs for digital twin model calibration. In this study, we consider a multi-scale mechanistic model for cell culture process, also known as Biological Systems-of-Systems (Bio-SoS), as our digital twin. This model with modular design, composed of sub-models, allows us to integrate data across various production processes. To calibrate the Bio-SoS digital twin, we evaluate the mean squared error of model prediction and develop a computational approach to quantify the impact of parameter estimation error of individual sub-models on the prediction accuracy of digital twin, which can guide sample-efficient and interpretable DoEs.

[6]  arXiv:2405.04011 [pdf, other]
Title: Adjoint Sensitivity Analysis on Multi-Scale Bioprocess Stochastic Reaction Network
Authors: Keilung Choy, Wei Xie
Comments: 11 pages, 2 figures
Subjects: Molecular Networks (q-bio.MN); Machine Learning (stat.ML)

Motivated by the pressing challenges in the digital twin development for biomanufacturing process, we introduce an adjoint sensitivity analysis (SA) approach to expedite the learning of mechanistic model parameters. In this paper, we consider enzymatic stochastic reaction networks representing a multi-scale bioprocess mechanistic model that allows us to integrate disparate data from diverse production processes and leverage the information from existing macro-kinetic and genome-scale models. To support forward prediction and backward reasoning, we develop a convergent adjoint SA algorithm studying how the perturbations of model parameters and inputs (e.g., initial state) propagate through enzymatic reaction networks and impact on output trajectory predictions. This SA can provide a sample efficient and interpretable way to assess the sensitivities between inputs and outputs accounting for their causal dependencies. Our empirical study underscores the resilience of these sensitivities and illuminates a deeper comprehension of the regulatory mechanisms behind bioprocess through sensitivities.

[7]  arXiv:2405.04248 [pdf, ps, other]
Title: Neurocomputational Phenotypes in Female and Male Autistic Individuals
Comments: 10 pages, 2 figures, 4 tables. Submitted to Journal of Science and Health, University of Alabama
Subjects: Neurons and Cognition (q-bio.NC); Chaotic Dynamics (nlin.CD)

Autism Spectrum Disorder (ASD) is characterized by an altered phenotype in social interaction and communication. Additionally, autism typically manifests differently in females as opposed to males: a phenomenon that has likely led to long-term problems in diagnostics of autism in females. These sex-based differences in communicative behavior may originate from differences in neurocomputational properties of brain organization. The present study looked to examine the relationship between one neurocomputational measure of brain organization, the local power-law exponent, in autistic vs. neurotypical, as well as male vs. female participants. To investigate the autistic phenotype in neural organization based on biological sex, we collected continuous resting-state EEG data for 19 autistic young adults (10 F), and 23 controls (14 F), using a 64-channel Net Station EEG acquisition system. The data was analyzed to quantify the 1/f power spectrum. Correlations between power-law exponent and behavioral measures were calculated in a between-group (female vs. male; autistic vs. neurotypical) design. On average, the power-law exponent was significantly greater in the male ASD group than in the female ASD group in fronto-central regions. The differences were more pronounced over the left hemisphere, suggesting neural organization differences in regions responsible for language complexity. These differences provide a potential explanation for behavioral variances in female vs. male autistic young adults.

Cross-lists for Wed, 8 May 24

[8]  arXiv:2405.03799 (cross-list from cs.LG) [pdf, other]
Title: Synthetic Data from Diffusion Models Improve Drug Discovery Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Artificial intelligence (AI) is increasingly used in every stage of drug development. Continuing breakthroughs in AI-based methods for drug discovery require the creation, improvement, and refinement of drug discovery data. We posit a new data challenge that slows the advancement of drug discovery AI: datasets are often collected independently from each other, often with little overlap, creating data sparsity. Data sparsity makes data curation difficult for researchers looking to answer key research questions requiring values posed across multiple datasets. We propose a novel diffusion GNN model Syngand capable of generating ligand and pharmacokinetic data end-to-end. We show and provide a methodology for sampling pharmacokinetic data for existing ligands using our Syngand model. We show the initial promising results on the efficacy of the Syngand-generated synthetic target property data on downstream regression tasks with AqSolDB, LD50, and hERG central. Using our proposed model and methodology, researchers can easily generate synthetic ligand data to help them explore research questions that require data spanning multiple datasets.

[9]  arXiv:2405.03879 (cross-list from stat.ML) [pdf, other]
Title: Scalable Amortized GPLVMs for Single Cell Transcriptomics Data
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Genomics (q-bio.GN); Applications (stat.AP)

Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.

[10]  arXiv:2405.03931 (cross-list from math.DS) [pdf, ps, other]
Title: Incorporating changeable attitudes toward vaccination into an SIR infectious disease model
Comments: 30 pages, 3 tables, 10 figures
Subjects: Dynamical Systems (math.DS); Populations and Evolution (q-bio.PE)

We develop a mechanistic model that classifies individuals both in terms of epidemiological status (SIR) and vaccination attitude (willing or unwilling), with the goal of discovering how disease spread is influenced by changing opinions about vaccination. Analysis of the model identifies existence and stability criteria for both disease-free and endemic disease equilibria. The analytical results, supported by numerical simulations, show that attitude changes induced by disease prevalence can destabilize endemic disease equilibria, resulting in limit cycles.

[11]  arXiv:2405.04078 (cross-list from cs.LG) [pdf, other]
Title: WISER: Weak supervISion and supErvised Representation learning to improve drug response prediction in cancer
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

Cancer, a leading cause of death globally, occurs due to genomic changes and manifests heterogeneously across patients. To advance research on personalized treatment strategies, the effectiveness of various drugs on cells derived from cancers (`cell lines') is experimentally determined in laboratory settings. Nevertheless, variations in the distribution of genomic data and drug responses between cell lines and humans arise due to biological and environmental differences. Moreover, while genomic profiles of many cancer patients are readily available, the scarcity of corresponding drug response data limits the ability to train machine learning models that can predict drug response in patients effectively. Recent cancer drug response prediction methods have largely followed the paradigm of unsupervised domain-invariant representation learning followed by a downstream drug response classification step. Introducing supervision in both stages is challenging due to heterogeneous patient response to drugs and limited drug response data. This paper addresses these challenges through a novel representation learning method in the first phase and weak supervision in the second. Experimental results on real patient data demonstrate the efficacy of our method (WISER) over state-of-the-art alternatives on predicting personalized drug response.

Replacements for Wed, 8 May 24

[12]  arXiv:1610.09637 (replaced) [pdf, ps, other]
Title: Nonequilibrium and nonlinear kinetics as key determinants for bistability in fission yeast G2-M transition
Authors: De Zhao (1 and 2), Teng Wang (1), Jian Zhao (2 and 3), Dianjie Li (1), Zhili Lin (1), Zeyan Chen (1), Qi Ouyang (1), Hong Qian (4), Yu V. Fu (2 and 3), Fangting Li (1) ((1) Peking University, Beijing, (2) Chinese Academy of Sciences, Beijing, (3) University of Chinese Academy of Sciences, Beijing,(4) University of Washington, Seattle)
Comments: 53 pages, 4 figures
Subjects: Molecular Networks (q-bio.MN); Subcellular Processes (q-bio.SC)
[13]  arXiv:2201.03193 (replaced) [pdf, ps, other]
Title: The impact of life-history strategies on the stability of competitive ecological network
Subjects: Populations and Evolution (q-bio.PE)
[14]  arXiv:2303.11833 (replaced) [pdf, other]
Title: Materials Discovery with Extreme Properties via Reinforcement Learning-Guided Combinatorial Chemistry
Authors: Hyunseung Kim (1), Haeyeon Choi (2), Dongju Kang (1), Won Bo Lee (1), Jonggeol Na (2) ((1) Seoul National University, (2) Ewha Womans University)
Comments: 18 pages, 8 figures
Journal-ref: Chemical Science, 2024
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
[15]  arXiv:2308.16093 (replaced) [pdf, other]
Title: Linking discrete and continuous models of cell birth and migration
Comments: 25 pages, 11 figures in main manuscript. 24 pages, 14 figures in supplementary information
Subjects: Cell Behavior (q-bio.CB)
[16]  arXiv:2309.02708 (replaced) [pdf, ps, other]
Title: Cooling down and waking up: feedback cooling switches an unconsciousness neural computer into a conscious quantum computer
Authors: Andrew Bell
Comments: 37 pages, 3 figures. Text reorganised; some text split off and placed at this https URL
Subjects: Neurons and Cognition (q-bio.NC); Biological Physics (physics.bio-ph)
[17]  arXiv:2310.09758 (replaced) [pdf, ps, other]
Title: Genome hybridization: A universal way for the origin and diversification of organelles as well as the origin and speciation of eukaryotes
Comments: 22 pages with two tables; added references for section 2; revised testable predictions for Section 5
Subjects: Other Quantitative Biology (q-bio.OT)
[18]  arXiv:2311.18142 (replaced) [pdf, other]
Title: Emergence of multiphase condensates from a limited set of chemical building blocks
Comments: Includes supplementary information
Subjects: Soft Condensed Matter (cond-mat.soft); Biological Physics (physics.bio-ph); Biomolecules (q-bio.BM)
[19]  arXiv:2401.13858 (replaced) [pdf, other]
Title: Graph Diffusion Transformer for Multi-Conditional Molecular Generation
Comments: 21 pages, 9 figures, 7 tables
Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
[20]  arXiv:2404.17626 (replaced) [pdf, other]
Title: Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Applications (stat.AP); Computation (stat.CO)
[21]  arXiv:2405.01015 (replaced) [pdf, other]
Title: Network reconstruction via the minimum description length principle
Authors: Tiago P. Peixoto
Comments: 17 pages, 10 figures. Code and documentation are available at this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Populations and Evolution (q-bio.PE)
[ total of 21 entries: 1-21 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, q-bio, recent, 2405, contact, help  (Access key information)