The G4Rep server is a computational tool developed to assess the potential of a protein to bind G-quadruplex (G4) structures. It achieves this by assigning a G4 binding score, which is derived using an advanced machine learning-based model trained on relevant protein features. In addition to providing a quantitative measure of binding potential, the server offers detailed insights into the specific regions of the protein that are most likely to interact with G4s. This prediction is accomplished through a comprehensive analysis of intrinsically disordered regions, which are often associated with protein-G4 interactions, and the amino acid composition, which plays a critical role in defining binding affinity.
G4REP accepts two types of input to process sequences.
For single sequence input, you can either provide a UniProt ID or paste a raw amino acid sequence directly (without including a FASTA header).
For multiple sequences, you can upload a FASTA file containing all the sequences you wish to analyze. Ensure the FASTA file is correctly formatted, with each sequence having a corresponding header line starting with ">".
This flexibility allows users to easily analyze individual sequences or batch process multiple sequences at once.
The minimum accepted sequence length is 20 amino acids to ensure reliable embedding generation and scoring.
The server returns results in a tabular format, where each row represents a protein entry. For each entry, a prediction value is provided (G4REP Model Binding Score), indicating whether the protein is likely to bind G4 structures. For high-scoring predictions (G4REP Model Binding Score > 0.8), a dedicated plot is generated showing:
Disorder score: This represents the predicted structural flexibility of each residue, obtained from a neural network–based disorder predictor (Erdos et al., 2024). Higher values indicate more flexible or disordered regions within the protein sequence.
G4REP score: indicates how likely a protein region is to interact with G-quadruplex structures, based on flexibility and residue content. It is defined for each window as:
$$ \mathrm{G}^4\mathrm{REP} = \Phi \left( \sum_{i=1}^n d_i,\; \frac{\sum_{j \in \{G,S,Y,F,R\}} 1_j}{n} \right) $$
where \(1_j\) flags whether residue \(j\) belongs to the G4-binding set. The operator \(\Phi\) combines summed disorder and motif density into a single scalar ranging from 0 up to the maximum average disorder. To filter out low-scoring regions, we apply a gating function:
$$ F(G, B) = \begin{cases} 0, & G < T \\ B, & G \ge T \end{cases} $$
where \(G\) is the G4REP Score, \(B\) is the G4REP Model Binding Score, and \(T\) is an empirically chosen threshold. Only windows where \(G \ge T\) pass through for further RG4-binding analysis, producing a focused set of candidate regions defined by both structural flexibility and enriched G–S–Y–F–R content.