The Glycan Landscape of the Gut Microbiome
Bifidobacteria are among the earliest colonizers of the human gut, conferring numerous health benefits. While multiple Bifidobacterium strains are used as probiotics, accumulating evidence suggests that the individual responses to probiotic supplementation may vary, likely due to a variety of factors, including strain type(s), gut community composition, dietary habits of the consumer, and other health/lifestyle conditions.
Given the saccharolytic nature of bifidobacteria, the carbohydrate composition of the diet is one of the primary factors dictating the colonization efficiency of Bifidobacterium strains. Therefore, a comprehensive understanding of bifidobacterial glycan metabolism at the strain level is necessary to rationally design probiotic or synbiotic formulations that combine bacterial strains with glycans that match their nutrient preferences.
In this study, we systematically reconstructed 66 pathways involved in the utilization of mono-, di-, oligo-, and polysaccharides by analyzing the representation of 565 curated metabolic functional roles (catabolic enzymes, transporters, transcriptional regulators) in 2973 non-redundant cultured Bifidobacterium isolates and metagenome-assembled genomes (MAGs). Our analysis uncovered substantial heterogeneity in the predicted glycan utilization capabilities at the species and strain level and revealed the presence of a yet undescribed phenotypically distinct subspecies-level clade within the Bifidobacterium longum species.
We also identified Bangladeshi isolates harboring unique gene clusters tentatively implicated in the breakdown of xyloglucan and human milk oligosaccharides. Predicted carbohydrate utilization phenotypes were experimentally characterized and validated. Our large-scale genomic analysis considerably expands the knowledge of carbohydrate metabolism in bifidobacteria and provides a foundation for rationally designing single- or multi-strain probiotic formulations of a given bifidobacterial species as well as synbiotic combinations of bifidobacterial strains matched with their preferred carbohydrate substrates.
The Adaptability of Bifidobacteria to Dietary Glycans
Bifidobacteria are Gram-positive saccharolytic microorganisms that predominantly inhabit the gastrointestinal tracts of various animals. Multiple Bifidobacterium species colonize the human gut throughout our lifespan, with many factors shaping this process, most notably the carbohydrate content of the diet.
For example, breastfeeding plays a pivotal role in fostering the dominance of specific Bifidobacterium species within the neonatal gut microbiota due to their evolutionary adaptation to metabolize milk glycans, such as human milk oligosaccharides (HMOs). Dietary changes throughout early life, namely the cessation of exclusive breastfeeding and the introduction of solid complementary foods, drive a gradual succession of bifidobacterial taxa from those tuned for HMO utilization to those more adapted to foraging dietary glycans (oligo- and polysaccharides) of plant origin.
Geographic/cultural differences in dietary practices profoundly influence the prevalence and abundance of specific Bifidobacterium taxa in gut microbial communities. For instance, Bifidobacterium longum subsp. infantis (Bl. infantis), a specialized HMO utilizer, constitutes up to 90% of the microbial composition in the microbiota of healthy breastfed infants from non-Westernized populations. In contrast, the neonatal microbiota of individuals from Westernized populations often lack Bl. infantis and instead are enriched in less proficient HMO utilizers such as Bifidobacterium longum subsp. longum (Bl. longum), Bifidobacterium breve, and Bifidobacterium pseudocatenulatum, likely due to a preference for formula use versus breastfeeding and other cultural differences.
The Benefits of Bifidobacteria and the Need for Targeted Probiotic Formulations
Despite this variability in taxonomic composition, the overall predominance of Bifidobacterium in gut communities is widely associated with multiple health benefits, especially in infancy. For example, the end products of bifidobacterial carbohydrate metabolism, lactate and acetate, can inhibit pathogen colonization and serve as a basis for cross-feeding with various community members. Additionally, multiple Bifidobacterium species produce aromatic lactic acids and other metabolites that modulate the immune system. Given these beneficial properties, some bifidobacterial strains are commonly administered as probiotics.
Complementary strategies, such as prebiotic administration, are employed to facilitate the engraftment of probiotic strains. However, methods for the rational selection of specific probiotic strains and prebiotic glycans for different populations need to be developed further. For example, the engraftment of a commonly used probiotic Bl. infantis strain may not be sufficiently stable in Bangladeshi children who do not receive enough breast milk and consume supplementary diets with high plant glycan content. Instead, Bl. infantis strains isolated from members of the target population may be better adapted to the local diet due to distinctive genomic features related to the metabolism of glycans commonly consumed in that population.
Thus, a priori knowledge of the metabolic potential to utilize carbohydrates, especially in Bifidobacterium strains from understudied populations, may be instrumental in the rational development of locally adapted pro- and prebiotics.
Uncovering the Genomic Diversity of Bifidobacterial Carbohydrate Metabolism
Comparative genomics and related computational approaches have been widely employed to predict the carbohydrate utilization capabilities that underpin the metabolic preferences of bifidobacterial strains. These approaches include analyses of the distribution of Carbohydrate Active Enzymes (CAZymes), genotype-to-phenotype matching, and genome-scale metabolic models.
However, insufficiently detailed expert curation of gene annotations and carbohydrate utilization pathways is a common shortcoming of many previous studies. For example, glycan transporters and transcription factors are frequently overlooked in metabolic reconstructions. Moreover, most analyses have used genomic datasets that did not fully capture the diversity within the Bifidobacterium genus, especially at the strain level.
In this work, we reconstructed 66 metabolic pathways defining carbohydrate utilization capabilities (phenotypes) by analyzing the representation of a manually curated set of 565 metabolic functional roles (catabolic enzymes, CAZymes, transporters, and transcriptional regulators) in 263 non-redundant reference genomes of human-associated Bifidobacterium strains, including 19 novel isolates from Bangladeshi children.
Forty predicted glycan utilization phenotypes were tested and validated in vitro using a panel of 16 strains. The obtained genomic compendium was leveraged to develop and apply a machine learning model, which enhanced the ability to predict the distribution of carbohydrate utilization pathways in additional 2710 non-redundant genomes of Bifidobacterium isolates or metagenome-assembled genomes (MAGs).
Uncovering Phenotypic and Genomic Diversity within the Bifidobacterium Genus
Our results uncovered remarkable variability of predicted glycan utilization capabilities at the species and strain levels and revealed the presence of a novel phenotypically distinct subspecies-level clade within the Bifidobacterium longum species. Furthermore, we identified several Bangladeshi Bifidobacterium strains harboring unique gene clusters enabling the catabolism of plant hemicelluloses and HMO mixtures, suggesting an evolutionary adaptation to maintenance in the intestinal ecosystems of children who were well into the weaning process.
The knowledge database obtained in our analyses enhances our understanding of carbohydrate metabolism and its diversification in bifidobacteria and provides a foundation for rational development of pre-, pro- and synbiotics. Our reconstruction covered a set of 263 non-redundant Bifidobacterium genomes of cultured isolates, including 19 novel strains from Bangladeshi infants.
We first performed a pangenome analysis to identify 487 core genes shared by all genomes and constructed a maximum-likelihood phylogenetic tree based on the alignment of concatenated sequences of these genes to verify the taxonomic assignments of each strain. This phylogenomic analysis, combined with pairwise comparisons of average nucleotide identities (ANI) of the genomes, allowed us to refine taxonomic assignments for 24 strains and delineate subspecies-level clades within the Bifidobacterium longum and Bifidobacterium catenulatum species.
Overall, the genomes in the reference set were both taxonomically (19 species and subspecies) and geographically (20 countries) diverse.
Leveraging Subsystems to Reconstruct Carbohydrate Utilization Pathways
We leveraged a subsystem-based comparative genomics approach to reconstruct carbohydrate utilization pathways and predict associated metabolic phenotypes. We first performed extensive mining of available literature and databases to identify 425 groups of orthologous carbohydrate utilization genes whose encoded protein products were either experimentally characterized in bifidobacteria or were homologous to proteins whose function had been established in other microbial taxa.
The analysis of genomic context combined with in silico reconstruction of transcriptional regulons allowed us to tentatively functionally annotate 140 additional groups of orthologous genes representing novel metabolic roles involved in glycan metabolism. As a result, we created a curated list of 565 roles that included 226 components of glycan-specific transporters, 188 catabolic CAZymes, 67 downstream catabolic enzymes, and 84 transcription factors.
Selected functional roles were used to reconstruct 21 pathways involved in the utilization of monosaccharides and their derivatives (sugar alcohols and acids), 39 pathways for the utilization of di- and oligosaccharides, and 10 pathways for polysaccharide degradation. The reconstructed pathways captured 81.8% of all catabolic CAZymes identified in the 263 reference genomes.
Predicting Glycan Utilization Phenotypes
We designed specific rules associating the gene patterns of each carbohydrate utilization pathway with corresponding metabolic phenotypes. These “phenotype rules” described sets of genes whose presence or absence in a genome distinguished metabolic pathway variants.
Pathway variants were translated into binary phenotypes corresponding to utilizers of a specific glycan (“1”) and non-utilizers (“0”). The results were combined into a Binary Phenotype Matrix (BPM) that captured a representation of all predicted glycan utilization capabilities across the 263 reference Bifidobacterium strains.
To initially assess the robustness of this approach, we compared the predicted carbohydrate utilization phenotypes with previously published in vitro growth data for six B. breve and 19 B. longum reference strains. The overall prediction accuracy was 95%, indicating the reliability of our computational framework.
Uncovering Strain-level Diversity in Bifidobacterial Glycan Metabolism
The obtained BPM was used to analyze the distribution of predicted glycan utilization capabilities across the Bifidobacterium genus. Hierarchical clustering of the BPM highlighted the variation in conservation of carbohydrate utilization pathways.
Glucose, galactose, lactose, galacto-oligosaccharide, and fructose utilization pathways were present in >99% of the analyzed reference strains, forming the core sugar catabolic capabilities of human-colonizing bifidobacteria. Other pathways were more sporadically distributed and associated with distinct glycan preferences of different Bifidobacterium species.
For example, B. bifidum, B. breve, and Bl. infantis strains had multiple pathways involved in the utilization of host-derived glycans, such as HMOs and their constituents, whereas B. adolescentis, B. dentium, Bl. longum, and B. pseudocatenulatum had many pathways driving the catabolism of plant oligo- and polysaccharides.
In addition to species-level differences, we observed a high degree of strain-level variability in predicted carbohydrate utilization capabilities. Namely, 55 out of 66 binary phenotypes varied between strains within at least one Bifidobacterium species. This strain-level heterogeneity suggested that a general carbohydrate utilization strategy characteristic for a particular Bifidobacterium species can often be tuned at the strain level.
For instance, Bl. infantis Bg064.S07_13.C6 harbored gene clusters encoding xylooligosaccharide and long-chain fructooligosaccharide utilization pathways, making this strain more adapted to metabolize dietary glycans of plant origin compared to most other Bl. infantis strains.
Unveiling a Novel Bifidobacterium longum Subspecies Clade
The hierarchical clustering by predicted phenotypes did not completely mirror the order of branches on the phylogenetic tree based on core genes, indicating an incomplete correspondence between the set of shared phenotypes and phylogenetic relatedness between genomes within the Bifidobacterium genus.
For instance, strains belonging to the species within the B. adolescentis phylogenetic group clustered together based on predicted carbohydrate utilization capabilities. In contrast, predicted phenotypic profiles of strains belonging to Bl. infantis and Bl. longum, two phylogenetically related subspecies within the B. longum species, were strikingly different.
Given the importance of B. longum in neonatal life, we performed a more focused analysis of pathway distribution within this heterogeneous species. Our phylogenomic and ANI-based analyses clustered B. longum genomes into three subspecies-level clades corresponding to Bl. infantis, Bl. longum, Bifidobacterium longum subsp. suis (Bl. suis), and a distinct clade termed hereafter Bl. nov.
Compared to other B. longum subspecies, the Bl. nov. clade had fewer predicted host glycan utilization pathways, suggesting a lineage-specific loss of genes and gene clusters. For example, all Bl. nov. strains lacked the lacto-N-biose/galacto-N-biose utilization pathway conserved in all other B. longum strains.
Conversely, only Bl. nov. strains encoded orthologs of extracellular α-amylase/pullulanase and a pathway driving the utilization of difructose dianhydrides. These differences in carbohydrate utilization capabilities, coupled with the ANI analysis, provide a foundation for the future tentative separation of Bl. nov. as a novel subspecies within B. longum.
Bifidobacterium Species Display Distinct Glycan Utilization Profiles
Analysis of the representation of carbohydrate utilization pathways revealed stark differences between Bl. infantis and Bl. longum. The primary genomic features differentiating Bl. infantis strains were gene clusters encoding transporters and GHs contributing to the utilization of various human milk oligosaccharides.
In contrast, Bl. longum strains had multiple extracellular GHs conferring the ability to partially degrade type I and II arabinogalactan, arabinan, arabinoxylan, and gum arabic, emphasizing their adaptation to metabolize plant mono-, oligo-, and polysaccharides. A minority of Bl. longum strains were able to internalize fucosylated HMOs.
We also observed substantial heterogeneity of predicted carbohydrate utilization phenotypes within Bl. suis. Some Bl. suis-like isolates clustered with Bl. longum based on the distribution of carbohydrate utilization pathways, while the outlier Bangladeshi Bl. suis strain Bg131.S11_17.F6 exhibited a Bl. infantis-type pattern, harboring gene clusters for the utilization of multiple HMOs but lacking plant glycan utilization pathways.
Bangladeshi Bifidobacterium Isolates Harbor Unique Glycan Utilization Capabilities
Our analysis uncovered additional genomic features of particular Bangladeshi Bifidobacterium strains that differentiated them from other reference strains of the same species. One such feature was a gene cluster in Bifidobacterium catenulatum subsp. kashiwanohense Bg42221_1E1 and Bg42221_1D3, but not in the non-Bangladeshi reference strains, that we hypothesized encodes the catabolism of the plant hemicellulose xyloglucan.
Another unique cluster identified in the Bangladeshi Bl. suis strain Bg131.S11_17.F6 contained orthologs of genes involved in the utilization of a broad range of human milk oligosaccharides, suggesting evolutionary adaptation to the intestinal ecosystems of children undergoing the weaning process.
To experimentally validate our computational predictions, we characterize