Methodological approach for the Palm Research Synthesis paper: combine manual ground-truth cluster definitions (via deep-reading 137 faculty profiles) with Scopus-derived co-authorship network analysis (Louvain community detection at multiple resolution parameters). Pre-register predictions before Scopus data arrives.
Why combine manual + algorithmic clustering
Most scientometric papers use one of two approaches:
-
Pure algorithmic clustering: extract a co-authorship network from Scopus, run Louvain or similar community-detection, interpret the resulting communities. Strengths: reproducible, scalable. Weaknesses: clusters often don’t match domain expert intuition; sensitive to resolution parameter; misses methodological vs topical distinctions.
-
Pure manual clustering: domain experts define clusters by reading the literature. Strengths: captures expert intuition + methodological nuance. Weaknesses: labor-intensive, hard to scale, subjective.
This paper uses both in sequence:
- Define manual ground-truth clusters by deep-reading 137 faculty profiles + curriculum + research-area mappings.
- Pre-register predictions: which Louvain γ values will recover the manual clusters; which bridge researchers will have highest betweenness centrality; which sub-clusters will emerge at higher γ.
- Run Scopus-based Louvain analysis after pre-registration.
- Compare manual vs algorithmic — concordance validates manual ground truth; discordance identifies cluster boundary uncertainties.
Data sources
Manual ground truth (this paper)
- 137 faculty profiles at Chulalongkorn University Faculty of Veterinary Science.
- Faculty profiles drawn from public chula.ac.th profiles + supplemented by Scopus / ResearchGate / PubMed where research areas need confirmation.
- Manual cluster assignment based on deep-read of:
- Stated research areas in faculty bios.
- Recent publication topics (2018-2025 emphasis).
- Center / unit affiliations.
- PhD origin lineages (training-school clusters).
13 manual clusters defined: PRRSV (1), CU-EIDAs (2), AHRU Poultry (3), CU-ARM (4), CE-FID Aquatic (5), CU-AF Theriogenology (6), SLU Repro (7), Wildlife ART (8), Cancer Mol Dx (9), Cardiac (10), Pathology Biomarker (11), Stem Cells (12), Vector / Parasitology (13). One residual unmapped cluster (0).
Algorithmic validation (Scopus extraction, pending)
- Scopus author IDs for all 137 PIs (manual mapping in progress).
- Co-authorship matrix from 2014-2024 publications.
- Louvain community detection at γ = 0.5, 1.0, 1.5, 2.0 — multiple resolutions to test cluster sensitivity.
- Betweenness centrality computation per PI.
Pre-registered hypotheses
Before Scopus extraction completes, the following predictions are pre-registered:
H1: Cluster recovery
At Louvain γ = 1.0, the algorithm will recover ≥9 of the 13 manual clusters with ≥70% concordance.
H2: Bridge researchers (top-betweenness)
8 named researchers (predicted from manual analysis) will be in the top 10 by betweenness centrality:
- 6 cross-cluster bridges (named in vault synthesis)
- 2 methodologists (biostat / livestock epidemiology)
H3: PRRSV-CU-EIDAs overlap
PRRSV (cluster 1) and CU-EIDAs (cluster 2) will show high inter-cluster edge density due to shared anchor researchers (e.g., the senior PI who founded SVEVR is also affiliated with CU-EIDAs).
H4: Methodologically-related clusters merge at low γ
At γ = 0.5, clusters with shared methodology (CAC-RU + Cardiac via proteomics; CU-ARM + CE-FID via WGS+AMR) will merge into method-driven super-clusters.
H5: Hidden sub-clusters at high γ
At γ = 1.5-2.0, 5 manually-identified clusters will split into predicted sub-structure:
- Pathology Biomarker (11) → 4 sub-clusters
- CU-AF Theriogenology (6) → 3 sub-clusters
- Cancer Mol Dx (9) → 2 sub-clusters
- CU-ARM (4) → 3 sub-clusters
- Cardiac (10) → 2 sub-clusters
H6: Cold spots = research-teaching gaps
Curriculum topics with no clear faculty research backing (~30% of all curriculum.js topics) will correlate with clusters scoring 3.5-4/6 on the maturity rubric.
Sensitivity analysis
Multiple sensitivity tests are pre-registered:
- Resolution sensitivity: Louvain at γ ∈ {0.5, 1.0, 1.5, 2.0} → which γ best recovers manual clusters?
- Method-based re-clustering: instead of clustering by topic, cluster by method signature (proteomics, WGS, cryobiology, etc.) — does the network look different?
- Time-window sensitivity: 2014-2018 vs 2019-2024 sub-windows → does the cluster structure shift?
- Author-disambiguation sensitivity: Scopus author ID merging is imperfect; sensitivity to merge errors should be reported.
Validation procedure
When Scopus data arrives:
- Test pre-registered hypotheses against actual data.
- Hits + misses report: publish all pre-registered predictions and outcomes (both successes and failures), with no post-hoc revision.
- Cluster-by-cluster narrative: per cluster, report manual definition vs algorithmic recovery + interpretation.
- Bridge researcher analysis: report betweenness rankings; compare to predicted bridges.
- Sub-cluster validation: at high γ, compare predicted sub-cluster splits vs algorithmic results.
Why pre-registration matters here
Bibliometric studies are vulnerable to post-hoc cluster naming — researchers run clustering, see the output, then label clusters in a way that fits the data. This produces “high concordance” but tells us nothing about the predictive validity of the clustering approach.
Pre-registering manual cluster definitions + predictions before seeing Scopus data avoids this. The paper’s contribution depends on:
- How accurate the manual ground-truth turned out to be.
- Where manual + algorithmic disagree (and why).
- Which pre-registered predictions were validated vs falsified.
Implications
If the methodology validates (high concordance + most pre-registered hypotheses confirmed):
- The manual ground-truth approach has external validity for other faculties.
- The 6-marker maturity rubric can be applied beyond Chula Vet.
- Bridge-researcher prediction from public profiles is reliable.
If the methodology fails (low concordance + most predictions falsified):
- Pure algorithmic clustering is more reliable than manual.
- The maturity rubric needs revision (or doesn’t generalize).
- Manual analysis adds little above Scopus-based clustering.
Either outcome is publishable — both successes and failures of the methodology generate evidence about scientometric methods’ real-world reliability.
Citation
Danoi, A. (2026). Co-Authorship Network Analysis for Veterinary Research Clusters — Methodology. Working pre-print retrieved from {URL}/research/methodology/co-authorship-network-analysis.
Methodology updates and validation reports will be published as the Scopus analysis progresses.
Limitations
- Manual cluster definitions are the work of one analyst (the paper’s first author) — inter-analyst reliability not yet tested.
- Scopus coverage of Thai-language vet publications may be incomplete; some research output may be missing from the analysis.
- Co-authorship as a proxy for collaboration omits non-publishing collaborations (committee work, informal mentorship).
- 137 faculty profiles is the current Chula Vet roster snapshot — future hires will require re-analysis.