Abstract
Motivation
Finding proteins with specific functions by mining modern databases can potentially lead to substantial advancements in wide range of fields, from medicine and biotechnology to material science. Currently available algorithms enable mining of proteins based on their sequence or structure. However, activities of many proteins, such as enzymes and drug targets, are dictated by active site residues and their surroundings rather than the overall structure or sequence of a protein.
Results
We introduce ActSeek—a computer vision-inspired fast program—that searches structural databases for proteins with active sites similar to the seed protein. ActSeek is implemented to mine proteins with desired active site environments from the Alphafold database. The potential of ActSeek to find innovative solutions to the world’s most pressing challenges is demonstrated by finding enzymes that may be used to produce biodegradable plastics or degrade plastics, as well as potential off-targets for common drug molecules.
Availability and implementation
ActSeek source code is available in https://github.com/vttresearch/ActSeek under Non-Commercial License Agreement.
Finding proteins with specific functions by mining modern databases can potentially lead to substantial advancements in wide range of fields, from medicine and biotechnology to material science. Currently available algorithms enable mining of proteins based on their sequence or structure. However, activities of many proteins, such as enzymes and drug targets, are dictated by active site residues and their surroundings rather than the overall structure or sequence of a protein.
Results
We introduce ActSeek—a computer vision-inspired fast program—that searches structural databases for proteins with active sites similar to the seed protein. ActSeek is implemented to mine proteins with desired active site environments from the Alphafold database. The potential of ActSeek to find innovative solutions to the world’s most pressing challenges is demonstrated by finding enzymes that may be used to produce biodegradable plastics or degrade plastics, as well as potential off-targets for common drug molecules.
Availability and implementation
ActSeek source code is available in https://github.com/vttresearch/ActSeek under Non-Commercial License Agreement.
| Original language | English |
|---|---|
| Article number | btaf424 |
| Journal | Bioinformatics |
| Volume | 41 |
| Issue number | 8 |
| DOIs | |
| Publication status | Published - Aug 2025 |
| MoE publication type | A1 Journal article-refereed |
Funding
Jane and Aatos Erkko Foundation (JAES) under project 220048 (Virtual laboratory for Biodesign, JAESBIODESIGN) and Research Council of Finland for financial support (358505 and 356568).
Keywords
- Algorithms
- Databases, Protein
- Catalytic Domain
- Software
- Proteins/chemistry
- Data Mining/methods
- Computational Biology/methods