Co-Authorship Network Analysis for Veterinary Research Clusters — Methodology

Methodological approach for the Palm Research Synthesis paper: combine manual ground-truth cluster definitions (via deep-reading 137 faculty profiles) with Scopus-derived co-authorship network analysis (Louvain community detection at multiple resolution parameters). Pre-register predictions before Scopus data arrives.

Why combine manual + algorithmic clustering

Most scientometric papers use one of two approaches:

Pure algorithmic clustering: extract a co-authorship network from Scopus, run Louvain or similar community-detection, interpret the resulting communities. Strengths: reproducible, scalable. Weaknesses: clusters often don’t match domain expert intuition; sensitive to resolution parameter; misses methodological vs topical distinctions.
Pure manual clustering: domain experts define clusters by reading the literature. Strengths: captures expert intuition + methodological nuance. Weaknesses: labor-intensive, hard to scale, subjective.

This paper uses both in sequence:

Define manual ground-truth clusters by deep-reading 137 faculty profiles + curriculum + research-area mappings.
Pre-register predictions: which Louvain γ values will recover the manual clusters; which bridge researchers will have highest betweenness centrality; which sub-clusters will emerge at higher γ.
Run Scopus-based Louvain analysis after pre-registration.
Compare manual vs algorithmic — concordance validates manual ground truth; discordance identifies cluster boundary uncertainties.

Data sources

Manual ground truth (this paper)

137 faculty profiles at Chulalongkorn University Faculty of Veterinary Science.
Faculty profiles drawn from public chula.ac.th profiles + supplemented by Scopus / ResearchGate / PubMed where research areas need confirmation.
Manual cluster assignment based on deep-read of:
- Stated research areas in faculty bios.
- Recent publication topics (2018-2025 emphasis).
- Center / unit affiliations.
- PhD origin lineages (training-school clusters).

13 manual clusters defined: PRRSV (1), CU-EIDAs (2), AHRU Poultry (3), CU-ARM (4), CE-FID Aquatic (5), CU-AF Theriogenology (6), SLU Repro (7), Wildlife ART (8), Cancer Mol Dx (9), Cardiac (10), Pathology Biomarker (11), Stem Cells (12), Vector / Parasitology (13). One residual unmapped cluster (0).

Algorithmic validation (Scopus extraction, pending)

Scopus author IDs for all 137 PIs (manual mapping in progress).
Co-authorship matrix from 2014-2024 publications.
Louvain community detection at γ = 0.5, 1.0, 1.5, 2.0 — multiple resolutions to test cluster sensitivity.
Betweenness centrality computation per PI.

Pre-registered hypotheses

Before Scopus extraction completes, the following predictions are pre-registered:

H1: Cluster recovery

At Louvain γ = 1.0, the algorithm will recover ≥9 of the 13 manual clusters with ≥70% concordance.

H2: Bridge researchers (top-betweenness)

8 named researchers (predicted from manual analysis) will be in the top 10 by betweenness centrality:

6 cross-cluster bridges (named in vault synthesis)
2 methodologists (biostat / livestock epidemiology)

H3: PRRSV-CU-EIDAs overlap

PRRSV (cluster 1) and CU-EIDAs (cluster 2) will show high inter-cluster edge density due to shared anchor researchers (e.g., the senior PI who founded SVEVR is also affiliated with CU-EIDAs).

At γ = 0.5, clusters with shared methodology (CAC-RU + Cardiac via proteomics; CU-ARM + CE-FID via WGS+AMR) will merge into method-driven super-clusters.

H5: Hidden sub-clusters at high γ

At γ = 1.5-2.0, 5 manually-identified clusters will split into predicted sub-structure:

Pathology Biomarker (11) → 4 sub-clusters
CU-AF Theriogenology (6) → 3 sub-clusters
Cancer Mol Dx (9) → 2 sub-clusters
CU-ARM (4) → 3 sub-clusters
Cardiac (10) → 2 sub-clusters

H6: Cold spots = research-teaching gaps

Curriculum topics with no clear faculty research backing (~30% of all curriculum.js topics) will correlate with clusters scoring 3.5-4/6 on the maturity rubric.

Sensitivity analysis

Multiple sensitivity tests are pre-registered:

Resolution sensitivity: Louvain at γ ∈ {0.5, 1.0, 1.5, 2.0} → which γ best recovers manual clusters?
Method-based re-clustering: instead of clustering by topic, cluster by method signature (proteomics, WGS, cryobiology, etc.) — does the network look different?
Time-window sensitivity: 2014-2018 vs 2019-2024 sub-windows → does the cluster structure shift?
Author-disambiguation sensitivity: Scopus author ID merging is imperfect; sensitivity to merge errors should be reported.

Validation procedure

When Scopus data arrives:

Test pre-registered hypotheses against actual data.
Hits + misses report: publish all pre-registered predictions and outcomes (both successes and failures), with no post-hoc revision.
Cluster-by-cluster narrative: per cluster, report manual definition vs algorithmic recovery + interpretation.
Bridge researcher analysis: report betweenness rankings; compare to predicted bridges.
Sub-cluster validation: at high γ, compare predicted sub-cluster splits vs algorithmic results.

Why pre-registration matters here

Bibliometric studies are vulnerable to post-hoc cluster naming — researchers run clustering, see the output, then label clusters in a way that fits the data. This produces “high concordance” but tells us nothing about the predictive validity of the clustering approach.

Pre-registering manual cluster definitions + predictions before seeing Scopus data avoids this. The paper’s contribution depends on:

How accurate the manual ground-truth turned out to be.
Where manual + algorithmic disagree (and why).
Which pre-registered predictions were validated vs falsified.

Implications

If the methodology validates (high concordance + most pre-registered hypotheses confirmed):

The manual ground-truth approach has external validity for other faculties.
The 6-marker maturity rubric can be applied beyond Chula Vet.
Bridge-researcher prediction from public profiles is reliable.

If the methodology fails (low concordance + most predictions falsified):

Pure algorithmic clustering is more reliable than manual.
The maturity rubric needs revision (or doesn’t generalize).
Manual analysis adds little above Scopus-based clustering.

Either outcome is publishable — both successes and failures of the methodology generate evidence about scientometric methods’ real-world reliability.

Citation

Danoi, A. (2026). Co-Authorship Network Analysis for Veterinary Research Clusters — Methodology. Working pre-print retrieved from {URL}/research/methodology/co-authorship-network-analysis.

Methodology updates and validation reports will be published as the Scopus analysis progresses.

Limitations

Manual cluster definitions are the work of one analyst (the paper’s first author) — inter-analyst reliability not yet tested.
Scopus coverage of Thai-language vet publications may be incomplete; some research output may be missing from the analysis.
Co-authorship as a proxy for collaboration omits non-publishing collaborations (committee work, informal mentorship).
137 faculty profiles is the current Chula Vet roster snapshot — future hires will require re-analysis.