Thank you for attending the event successful. See you next year!
Backed with 15 years of academic and research background, Dr. Poo very enthusiastic in areas spanning Big Data Analytics, Machine Learning, Deep Learning, High Performance Computing, Distributed Systems and peer-to-peer (P2P) networks. He is a graduate of Nagoya Institute of Technology (Japan), Dr. Poo holds a Doctorate degree (Ph.D.) in Computer Science, a Master of Information Technology (UKM), and a Bachelor of Science (UKM). Besides that, he is also interested in algorithm design and performance modeling of various distributed systems based on a variety of theories, including optimization, graph theory, and network coding. He have a practical approach to problem solving and a drive to see things through to completion.
Description: The R programming language is entering a transformative era where artificial intelligence is fundamentally reshaping how we write code and conduct statistical analysis. This keynote explores the convergence of R's statistical heritage with cutting-edge AI capabilities—from AI coding assistants democratizing complex analyses to R's integration with modern machine learning frameworks.
Prof Dr. Kamarul Imran Musa is a Professor of Epidemiology and Statistics at Universiti Sains Malaysia, has been teaching R programming for advanced data analysis in health and medicine since 2012. He has authored two books on R-based modeling and led numerous postgraduate courses covering basic to advanced epidemiological methods using R. With over 195 SCOPUS-indexed publications and 85,000+ citations, he ranks among the top 2% most cited researchers globally. His work integrates statistical modeling, AI, and digital health innovations, including the CaknaStrok m-Health app, reflecting his deep commitment to data-driven public health solutions
This presentation introduces how data analysts can leverage AI assistants like ChatGPT, Claude, Gemini, and other large language models to enhance their R programming workflow. The REAAAPP framework (Read data, Explore data, Assumption, Analysis, Assessment, Present, Publish) provides a structured approach to incorporating AI throughout the analytical process. Learn practical strategies for prompt engineering, code generation and iterative problem-solving. Whether you're an epidemiologist, biostatistician, or data scientist, discover how AI assistants can accelerate your work while maintaining scientific rigor, reproducibility, and best practices in statistical computing.
Dr. Yu Yong Poh is an accomplished data science and AI expert with over 15 years of experience in both industry and academic settings. Dr. Yu has extensive experience in managing and delivering data science and AI projects across a range of industries, including banking, healthcare, manufacturing, and digital marketing. He has successfully led teams of data scientists and analysts, overseeing all aspects of project management and people leadership. In addition, Dr. Yu is a dedicated educator and trainer, conducting professional workshops on topics such as data science, machine learning, big data analytics, text and image analytics, and programming languages like Python and R.
How does a company truly become "data-driven"? It starts with insights, but it scales with automation and accessibility. This talk bridges the gap between R's powerful Tidyverse and tangible business outcomes. We'll cover practical case studies in finance, marketing, and operations, showing how R can be used to automate critical reports with Quarto/R Markdown, build dashboards for executive C-suite with Shiny, and model customer behavior to drive strategy.
Dr. Mohd Azmi Suliman is a Public Health Medicine Specialist at the Institute for Public Health, Ministry of Health Malaysia. He works extensively with the National Health and Morbidity Survey (NHMS), focusing on complex sampling design, data analysis, and statistical modeling using R. His interests include epidemiology, medical statistics, and promoting reproducible workflows for public health research.
"Many analysts assume a sample proportion reflects the population, but in large surveys this is rarely true. National health surveys, such as Malaysia’s NHMS, rely on complex sampling with stratification, clustering, and unequal probabilities of selection. Ignoring these features can bias prevalence estimates and mislead decision-makers.
This talk will explain, in plain language, why design matters and how a “simple proportion” can go wrong. Using examples from national surveys, I will show how weighting and design effects change both estimates and confidence intervals. Practical demonstrations in R with the survey and srvyr packages will illustrate how to specify designs and calculate valid prevalence. Attendees will leave with tools for more accurate, reproducible analyses."
Dr Mohd Fittri Fahmi Fauzi is a Doctor of Public Health candidate at Universiti Sains Malaysia, specialising in epidemiology and biostatistics. His work focuses on infectious disease modelling, surveillance analytics, and the use of R for public health applications. He has developed predictive and diagnostic models in R, translating them into interactive R Shiny tools for clinical and epidemiologic decision support. With prior experience in the Ministry of Health Malaysia, he is passionate about applying open-source analytics to strengthen disease control systems.
Statistical models often remain confined to scripts and publications, distant from the hands of practitioners who could benefit most from them. This talk demonstrates how R Shiny can serve as a bridge transforming static models into dynamic, user-friendly applications. Drawing from real-world public health projects, it explores practical workflows for converting analytical outputs into interactive tools that support decision-making, risk assessment, and resource prioritization. Emphasis will be placed on model translation, interface design, and reproducibility, showing how R can democratize statistical insight beyond the analyst’s desk.
Dr. Gan Chew Peng is a passionate educator and researcher with nearly twenty years of experience in statistics and actuarial science. She holds a PhD in Business (Actuarial Mathematics) and several SAS certifications. Her research interests include credit risk modeling, machine learning, and applied statistics. Known for her commitment to inspiring students and integrating real-world applications into teaching, Dr. Gan actively collaborates with industry partners and contributes to academic leadership to advance innovative learning and impactful research.
This study examines what drives automobile claim frequency in Singapore using the comprehensive Singapore Automobile Claims dataset from the General Insurance Association of Singapore (GIA). By integrating insights from driver demographics, vehicle characteristics, and insurance metrics, the research aims to improve risk assessment and premium pricing in the motor insurance sector. Three zero-inflated models—Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Zero-Inflated Double Poisson (ZIDP)—were employed to address the data’s over-dispersion and excess zeros. Model evaluation using Pearson’s χ², AIC, and BIC revealed that the ZIP model best captured the complexities of claim frequency patterns. Significant predictors include driver age, gender, vehicle type, vehicle age, and No Claims Discount (NCD). The findings offer actionable insights for insurers to design fairer, data-driven premium structures and highlight the value of statistical modeling in understanding risk behavior within Singapore’s dynamic urban landscape.
Dr. Wan Nor Arifin is a lecturer at Biostatistics and Research Methodology Unit, School of Medical Sciences, Universiti Sains Malaysia. I teach medical statistics and some other interesting subjects. I am mainly interested in the development and validation of measurement tools and machine learning, especially for use in clinical and public health settings. I am also interested in R statistical programming language and promoting its use in medical statistics. Of note, I use R everyday, even as a desk calculator.
The talk describes the speaker's journey from an R package user, to creating R functions, to sharing an R package in GitHub and getting listed in CRAN R package repository
Dr. Calyn is a Public Health Medicine Specialist at the Sector for Biostatistics and Data Repository, National Institutes of Health (NIH), Ministry of Health Malaysia. With 15 years of experience in the public healthcare system, including six years in clinical practice, she currently focuses on advancing data and information governance, particularly in research and clinical registry data. A passionate health advocate and storyteller, she bridges science and art within health and healthcare, applying creative and innovative approaches, and systems thinking to transform complex data into meaningful insights.
What if data could feel more human? This session explores how design thinking infuses empathy and creativity into data science using R. Through real-world health examples, we’ll discover how visualization, storytelling, and rapid prototyping can bridge the gap between analysis and action. Learn how R can become not just a statistical tool, but a medium for human-centered insight — transforming data into stories that inspire better health and better systems.
Dr Faiz is a Public Health Medicine Specialist at the Environmental Health Research Centre, Institute for Medical Research, National Institutes of Health Malaysia. His work focus on environmental health, including studies in environmental epidemiology, air pollution, and climate change. He has led several national-level research projects and contributes to technical working groups on planetary health and environmental health policy under the Ministry of Health Malaysia.
This talk highlights how the dlnm package can be used in R to model short-term health effects of environmental exposures such as air pollution and temperature. Drawing on real-world studies from Malaysia, it will demonstrate practical steps in model setup, visualisation, and interpretation, showing how time-series analysis can uncover meaningful links between environment and health.
Dr. Mohammad Nasir Abdullah, a senior lecturer in the Department of Statistics at Universiti Teknologi MARA, Tapah Campus. With a first degree in statistics from UiTM under his belt, Dr. Nasir ventured into the professional world as a data analyst and a business analyst for both multinational and local companies. Fuelled by a passion for numbers, Dr. Nasir continued his academic pursuit, earning his second degree from USM and a certification as a data science specialist. In 2011, Dr. Nasir took the leap and became a lecturer, imparting his knowledge on mathematical statistics, probability and statistics, operation research, research methodology, statistical software, and statistical programming to eager students at diploma and degree levels. In 2022, Dr. Nasir was awarded his PhD in Statistics from UiTM, cementing his place as a respected and knowledgeable authority in the field. Dr. Nasir's current research interest lies in machine learning classifiers in data classification, with a particular focus on regularisation techniques.
This talk presents a practical end to end R workflow for discovering candidate biomarkers from array based gene expression data. We begin with leakage safe preprocessing, stratified data partitioning, and basic batch diagnostics. Features are shortlisted with Boruta and then refined through correlation pruning to improve stability. Multiple classifiers are compared fairly with identical resampling using tidymodels, and performance is evaluated with ROC AUC and PR AUC. We then calibrate predicted probabilities and select decision thresholds based on misclassification costs to support real clinical use. The session closes with a compact reproducibility template using targets and renv, plus clear reporting guidance for clinical and regulatory audiences. A short live demo takes a normalized expression matrix to a calibrated and interpretable candidate panel that is ready for external validation. Attendees leave with a checklist, code snippets, and a reusable folder structure.
Dr. Mohd Kamarulariffin is a medical doctor with a PhD from Universiti Malaya. His clinical background offers a valuable frontline perspective that strengthens his data-driven research. Currently serving as a Principal Investigator at the Biomedical Epidemiology Unit in Institute for Medical Research (IMR), his work focuses on health informatics and data analytics. He applies machine learning and simulation modelling to address critical healthcare challenges, including outbreak prediction and emergency department crowding.
In today’s data-driven world, text data is everywhere — from customer feedback and product reviews to social media posts and log files. Making sense of this information requires the ability to recognize patterns, extract insights, and clean messy text efficiently.
This 1.5-hour hands-on workshop introduces participants to the fundamentals of text analysis using regular expressions (regex) in Base R. Participants will learn how to identify and match text patterns, extract key information, and transform unstructured text into usable data — all without relying on additional packages.
Through guided coding exercises, attendees will explore how to use essential R functions to search, clean, and manipulate text data. Practical examples will demonstrate how regex can streamline data validation, automate cleaning tasks, and standardize text inputs across datasets.
By the end of the session, participants will gain practical skills and confidence in applying regular expressions in R for efficient text processing — a powerful capability for anyone working with real-world data.
Dr. Yu Yong Poh is an accomplished data science and AI expert with over 15 years of experience in both industry and academic settings. Dr. Yu has extensive experience in managing and delivering data science and AI projects across a range of industries, including banking, healthcare, manufacturing, and digital marketing. He has successfully led teams of data scientists and analysts, overseeing all aspects of project management and people leadership. In addition, Dr. Yu is a dedicated educator and trainer, conducting professional workshops on topics such as data science, machine learning, big data analytics, text and image analytics, and programming languages like Python and R.
R basics, Data Manipulation with R, Machine Learning Fundamentals, Introduction to R Shiny