KnowHub Repository
KnowHub is a corporate knowledge base that provides the system to identify, capture and publish the most critical organizational knowledge products, both tacit and explicit.
Explore Collections on KnowHub
Communities in Knowhub
Select a community to browse its collections.
Recent Submissions
Item type: Item , Access status: Metadata only , Hybrid Deep Learning for Anti-Money Laundering: Unsupervised Detection of Emerging Schemes Via Feature Fusion and Explainable Artificial Intelligence, Machine Learning with Application(Science Direct, 2026) Kungu, C.O.; Senagi, K.; Omondi, E.Traditional rule-based anti-money laundering (AML) transaction monitoring systems suffer from high false-positive rates and rigidity in detecting complex emerging risk. This limitation has prompted changes to the Financial Action Task Force (FATF) recommendation 16, mandating the use of advanced systems for detecting money laundering schemes in cross-border payments. This study developed a hybrid framework integrating VAE-learned behavioural latent factors, GNN-captured relational network signals, and rule-based heuristics for enhanced anomaly detection. The model was evaluated on 54,258 real-world cross-border transaction records from an East African commercial bank. The One-Class SVM, optimised via a rigorous grid search proved superior compared to Isolation Forest and Local Outlier Factor benchmark, achieving a precision of 99.63% in the top 5% of prioritised alerts. Independent validation by a Kenyan financial institution confirms a batch processing speed of 1000 transactions per second on standard computer hardware (Intel Core i7, 16 GB RAM) and efficient high-priority alert triage, key requirements for deployment in financial institutions. Shapley additive explanations analysis further provided the interpretability of the feature contribution to the model performance. These results demonstrated that integration of rule-based features with deep-learning embeddings improves compliance work efficiency and proven pathway for resource-constrained financial institutions to comply with FATF regulatory demands upcoming in 2030.Item type: Item , Access status: Metadata only , Integrating GOF Tests and Cross Validation for Copula Model Selection.(Science Direct, 2026) Otieno, K.; Chaba, L.; Omondi, E.; Odhiambo, C.; Omolo, B.In dependence modeling, choosing the right copula is crucial, as different copula models can yield distinct interpretations of the relationship between variables. However, real-world applications are often constrained by the limitations of existing copula selection methods, which lack consistency and robustness across datasets. The selection methods in the literature that includes goodness-of-fit (GoF) tests and selection criteria, often yield conflicting results, thereby misrepresenting the dependence structure and leading to misleading conclusions. This study developed an integrated copula selection framework that combines GOF tests with cross-validation techniques. We integrated block-based cross-validation with GoF tests, where data was partitioned into blocks of different sizes. A copula was fitted on the training set, and its performance was validated on the test set using GoF measures. The selection process was repeated across multiple folds, and an aggregation method was applied to determine the most suitable copula. The approach was tested through Monte Carlo simulations and an empirical study was tested on weather variables in Kenya. The findings show that Kendall-based Kolmogorov Smirnov (KendallKS) and Cramrvon Mises (KendallCvM) test statistics integrated with stratified cross-validation, when, perform better when the Benjamini Hochberg (BH) procedure was used for aggregation. The proposed tests successfully identified the true copula and consistently rejected incorrect alternatives, with performance improving as sample size and dependence level increased. The empirical application further demonstrates the methods robustness in real-world settings. These findings demonstrate that the proposed approach enhances the reliability and stability of copula selection.Item type: Item , Access status: Metadata only , Towards an African-Led Model For Strengthening Capacity in Medical Statistics and Epidemiology in Sub-Saharan Africa: An Equitable Partnership Approach.(Oxford Academic, 2026) Abdulla, M.; Mohammed, N.; Chirwa, T.; Abaasa, A.; Floyd, S.; Webb, E. L.; Ayieko, P.; Simms, V.; George, E.C.; Gachie, T.; Kiroro, F.; Weiss, H. A.This paper describes the International Statistics and Epidemiology Partnership (ISEP), initiated in January 2024 through a 5-year UK Medical Research Council (MRC) grant. The demand for expertise in applied medical statistics and epidemiology in Africa far exceeds supply, and is rapidly growing with large-scale data sources such as electronic health records, genomics, geospatial data, and wearable technology. ISEP aims to implement a sustainable strategy to strengthen capacity in applied medical statistics across early and mid-career stages in Africa, with a pathway to becoming African-led by 2028. The partnership involves six African research institutions and two UK institutions. Key objectives include creating a collaborative network of medical statisticians, increasing knowledge through shared training, and raising awareness of the need for greater capacity. Informed by a needs assessment survey, ISEP addresses identified barriers including lack of mentors, limited formal training, and insufficient networking and job opportunities, through four work packages covering networking, training, stakeholder engagement, and partnership management.Item type: Item , Access status: Metadata only , Evaluating the Impact of OMOP-CDM on Data Quality Insight Generation in Respiratory Disease Management.(Frontiers, 2026) Yankam, B.M.; Luc Baudoin, F. T.; Andeso P.; Onana Akoa, F. A.; Ebimbe, J. B.; Barasa, M.; Onana, M.; Iddi, S.; Kiragga, A.; Mbatchou Ngahane, B. H; Data Science Without Borders ProjectThe increasing volume and heterogeneity of patient care data present significant challenges for comprehensive analysis and the generation of insights, particularly in specific areas such as respiratory diseases. Standardizing diverse health data is crucial for enabling large-scale observational research and ensuring data readiness. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) provides a widely adopted standard for harmonizing such data. However, evaluating the quality of data transformed into the OMOP CDM format is a critical step before its use in research or clinical decision support. This study evaluates the impact of the OMOP CDM standardization process on generating data quality insights for a respiratory disease dataset. The source dataset was initially paper-based, converted to an electronic format, and translated from French into English. This historical dataset covers the years 2009-2023 and contains 108 variables and 2,154 records. The data underwent the standard Extract, Transform, and Load (ETL) process to convert into the OMOP CDM format. Following this transformation, the quality of the resulting OMOP CDM instance was assessed. The Data Quality Dashboard (DQD) was utilized to evaluate the quality of the OMOP CDM database before and after ETL verification, with checks on completeness, plausibility, and conformance. Overall, the assessment conducted 2,344 checks, of which 2,269 passed and 75 failed, resulting in a corrected pass rate of 96% before ETL verification. After ETL verification, the assessment conducted 2,374 checks, of which 2,356 passed and 40 failed, resulting in a 100% corrected pass rate. Standardizing respiratory disease data using the OMOP CDM enabled a structured and transparent evaluation of data quality, demonstrating the utility of OMOP CDM in generating meaningful data quality insights, and highlighting the model's potential to enhance data readiness and support evidence-based decision-making in respiratory disease management.Item type: Item , Access status: Metadata only , Comparing Allometric Models to Machine Learning Models for Aboveground Biomass Estimation in Agroforestry Systems in Kenya, Machine Learning with Applications(Science Direct, 2026) Kigotho, S. I.; Senagi, K.; Olukuru, J.; Makori, D.M.; Abdel-Rahman, E.M.; Omondi, E.This study compared traditional allometric models with machine learning (ML) techniques for accurately estimating aboveground biomass (AGB) in six Acacia species within Kenyan agroforestry systems. Using tree diameter at breast height (DBH) and total height as inputs, the research evaluated allometric models (Chave’s, Brown’s, and Henry’s) against ML models, including Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Regression (SVR). This research advances the field by benchmarking machine learning and classical allometric models at the species level within Kenyan agroforestry systems and employing SHapley Additive exPlanations (SHAP) for interpretability analysis. Model performance was validated using cross-validation, and predictive accuracy was assessed using the coefficient of determination ( ), Root Mean Squared Error (RMSE), relative Root Mean Squared Error (rRMSE), and Mean Absolute Percentage Error (MAPE). The results demonstrated that ML models generally outperformed the allometric approaches. While Chave’s allometric model was the best-performing traditional method, ML models such as GB and XGBoost achieved superior accuracy for most species, as reflected by higher predictive accuracy (R2) ) and lower errors as measured by RMSE, rRMSE, and MAPE. For example, GB performed best for Acacia drepanolobium (R2 = 0.989) ), while XGBoost showed the highest accuracy for Acacia nilotica (R2 = 0.990). RF also demonstrated strong and stable performance, whereas SVR exhibited comparatively lower and less consistent accuracy across species. SHAP value analysis indicated that DBH was the most influential predictor across all ML models, with tree height providing complementary explanatory power. The findings highlight the superior adaptability of ML models in capturing complex, non-linear relationships in heterogeneous agroforestry environments. This study contributes empirical evidence supporting the integration of ML techniques with conventional allometric approaches as a robust framework for improving AGB estimation. Future research should incorporate additional predictors such as wood density and integrate remote sensing data such as Light Detection and Ranging (LiDAR) to enhance scalability and precision, thereby supporting improved carbon stock assessments and agroforestry-based carbon credit initiatives.




