Publications
This page provides an exhaustive overview of my publication record, including journal articles, conference papers, and other scholarly contributions.
2025
- COLING
Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical ClassificationShijing Chen, Mohamed Reda Bouadjenek, Usman Naseem, Basem Suleiman, Shoaib Jameel, Flora Salim, Hakim Hacid, and Imran RazzakIn Proceedings of the 31st International Conference on Computational Linguistics, Jan 2025Multi-level Hierarchical Classification (MLHC) tackles the challenge of categorizing items within a complex, multi-layered class structure. However, traditional MLHC classifiers often rely on a backbone model with n independent output layers, which tend to ignore the hierarchical relationships between classes. This oversight can lead to inconsistent predictions that violate the underlying taxonomy. Leveraging Large Language Models (LLMs), we propose novel taxonomy-embedded transitional LLM-agnostic framework for multimodality classification. The cornerstone of this advancement is the ability of models to enforce consistency across hierarchical levels. Our evaluations on the MEP-3M dataset - a Multi-modal E-commerce Product dataset with various hierarchical levels- demonstrated a significant performance improvement compared to conventional LLMs structure.
-
Improving out-of-distribution detection by enforcing confidence marginLakpa Tamang, Mohamed Reda Bouadjenek, Richard Dazeley, and Sunil AryalKnowledge and Information Systems, Jan 2025In many critical machine learning applications, such as autonomous driving and medical image diagnosis, the detection of out-of-distribution (OOD) samples is as crucial as accurately classifying in-distribution (ID) inputs. Recently, outlier exposure (OE)-based methods have shown promising results in detecting OOD inputs via model fine-tuning with auxiliary outlier data. However, most of the previous OE-based approaches emphasize more on synthesizing extra outlier samples or introducing regularization to diversify OOD sample space, which is rather unquantifiable in practice. In this work, we propose a novel and straightforward method called Margin-bounded Confidence Scores (MaCS) to address the nontrivial OOD detection problem by enlarging the disparity between ID and OOD scores, which in turn makes the decision boundary more compact facilitating effective segregation with a simple threshold. Specifically, we augment the learning objective of an OE regularized classifier with a supplementary constraint, which penalizes high confidence scores for OOD inputs compared to that of ID and significantly enhances the OOD detection performance while maintaining the ID classification accuracy. Extensive experiments on various benchmark datasets for image classification tasks demonstrate the effectiveness of the proposed method by significantly outperforming state-of-the-art methods on various benchmarking metrics. The code is publicly available at https://github.com/lakpa-tamang9/margin_ood/tree/kais.
- WWW
Unmasking Gender Bias in Recommendation Systems and Enhancing Category-Aware FairnessTahsin Alamgir Kheya, Mohamed Reda Bouadjenek, and Sunil AryalIn Proceedings of the ACM on Web Conference 2025, Sydney NSW, Australia, Jan 2025Recommendation systems are now an integral part of our daily lives. We rely on them for tasks such as discovering new movies, finding friends on social media, and connecting job seekers with relevant opportunities. Given their vital role, we must ensure these recommendations are free from societal stereotypes. Therefore, evaluating and addressing such biases in recommendation systems is crucial. Previous work evaluating the fairness of recommended items fails to capture certain nuances as they mainly focus on comparing performance metrics for different sensitive groups. In this paper, we introduce a set of comprehensive metrics for quantifying gender bias in recommendations. Specifically, we show the importance of evaluating fairness on a more granular level, which can be achieved using our metrics to capture gender bias using categories of recommended items like genres for movies. Furthermore, we show that employing a category-aware fairness metric as a regularization term along with the main recommendation loss during training can help effectively minimize bias in the models’ output. We experiment on three real-world datasets, using five baseline models alongside two popular fairness-aware models, to show the effectiveness of our metrics in evaluating gender bias. Our metrics help provide an enhanced insight into bias in recommended items compared to previous metrics. Additionally, our results demonstrate how incorporating our regularization term significantly improves the fairness in recommendations for different categories without substantial degradation in overall recommendation performance.
-
A comprehensive study of audio profiling: Methods, applications, challenges, and future directionsAnil Pudasaini, Muna Al-Hawawreh, Mohamed Reda Bouadjenek, Hakim Hacid, and Sunil AryalNeurocomputing, Jan 2025Audio profiling is at the forefront of a technological breakthrough, offering rich insights into human behavior, emotions, physical attributes, and environmental contexts through detailed analysis of voice data. As we embrace an era where the integration of smart technologies equipped with the ability to capture sound is becoming ubiquitous, the capacity to accurately infer personal traits such as age, gender, height, weight, emotional state, personality, and even environmental contexts through voice analysis opens up vast opportunities across law enforcement, healthcare, social and commercial services, and entertainment. This emerging field promises to enhance our interaction with technology by not only understanding who we are but also by interpreting the world around us. However, the remarkable landscape is fraught with challenges, including data imbalances, the complexity of predictive models, and significant privacy concerns regarding the handling of sensitive paralinguistic information. This survey explores deep into the current landscape of audio profiling, examining the techniques and datasets in use, and showcasing its diverse applications while highlighting the need for advanced methodologies, enriched dataset development, and robust privacy preservation techniques.
-
Data-driven machinery fault diagnosis: A comprehensive reviewDhiraj Neupane, Mohamed Reda Bouadjenek, Richard Dazeley, and Sunil AryalNeurocomputing, Jan 2025In this era of advanced manufacturing, it is now more crucial than ever to diagnose machine faults as early as possible to guarantee their safe and efficient operation. With the increasing complexity of modern industrial processes, traditional machine health monitoring approaches cannot provide efficient performance. With the massive surge in industrial big data and the advancement in sensing and computational technologies, data-driven machinery fault diagnosis solutions based on machine/deep learning approaches have been used ubiquitously in manufacturing applications. Timely and accurately identifying faulty machine signals is vital in industrial applications for which many relevant solutions have been proposed and are reviewed in many earlier articles. Despite the availability of numerous solutions and reviews on machinery fault diagnosis, existing works often lack several aspects. Most of the available literature has limited applicability in a wide range of manufacturing settings due to their concentration on a particular type of equipment or method of analysis. Additionally, discussions regarding the challenges associated with implementing data-driven approaches, such as dealing with noisy data, selecting appropriate features, and adapting models to accommodate new or unforeseen faults, are often superficial or completely overlooked. Thus, this survey provides a comprehensive review of the articles using different types of machine learning approaches for the detection and diagnosis of various types of machinery faults, highlights their strengths and limitations, provides a review of the methods used for predictive analyses, comprehensively discusses the available machinery fault datasets, introduces future researchers to the possible challenges they have to encounter while using these approaches for fault diagnosis and recommends the probable solutions to mitigate those problems. The future research prospects are also pointed out for a better understanding of the field. We believe that this article will help researchers and contribute to the further development of the field.
- ECML
Bias vs Bias Dawn of Justice: A Fair Fight in Recommendation SystemsTahsin Alamgir Kheya, Mohamed Reda Bouadjenek, and Sunil AryalIn Machine Learning and Knowledge Discovery in Databases. Research Track, Jan 2025Recommendation systems play a crucial role in our daily lives by impacting user experience across various domains, including e-commerce, job advertisements, entertainment, etc. Given the vital role of such systems in our lives, practitioners must ensure they do not produce unfair and imbalanced recommendations. Previous work addressing bias in recommendations overlooked bias in certain item categories, potentially leaving some biases unaddressed. Additionally, most previous work on fair re-ranking focused on binary-sensitive attributes. In this paper, we address these issues by proposing a fairness-aware re-ranking approach that helps mitigate bias in different categories of items. This re-ranking approach leverages existing biases to correct disparities in recommendations across various demographic groups. We show how our approach can mitigate bias on multiple sensitive attributes, including gender, age, and occupation. We experimented on three real-world datasets to evaluate the effectiveness of our re-ranking scheme in mitigating bias in recommendations. Our results show how this approach helps mitigate social bias with little to no degradation in performance.
- ECAI
HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data RepresentationYouran Zhou, Mohamed Reda Bouadjenek, Jonathan Wells, and Sunil AryalIn ECAI 2025, Jan 2025Handling incomplete and heterogeneous data remains a central challenge in real-world machine learning, where missing values may follow complex mechanisms (MCAR, MAR, MNAR) and features can be of mixed types (numerical and categorical). Existing methods often rely on imputation, which may introduce bias or privacy risks, or fail to jointly address data heterogeneity and structured missingness. We propose the Heterogeneous Incomplete Probability Mass Kernel (HI-PMK), a novel data-dependent representation learning approach that eliminates the need for imputation. HI-PMK introduces two key innovations: (1) a probability mass-based dissimilarity measure that adapts to local data distributions across heterogeneous features (numerical, ordinal, nominal), and (2) a missingness-aware uncertainty strategy (MaxU) that conservatively handles all three missingness mechanisms by assigning maximal plausible dissimilarity to unobserved entries. Our approach is privacy-preserving, scalable, and readily applicable to downstream tasks such as classification and clustering. Extensive experiments on over 15 benchmark datasets demonstrate that HI-PMK consistently outperforms traditional imputation-based pipelines and kernel methods across a wide range of missing data settings. Code is available at: github.com/echoid/Incomplete-Heter-Kernel.
- CIKM
MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data ImputationYouran Zhou, Mohamed Reda Bouadjenek, and Sunil AryalIn Proceedings of the 34th ACM International Conference on Information and Knowledge Management, Seoul, Republic of Korea, Jan 2025Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion probabilistic models (DDPMs), suffer from high inference latency and variable outputs, limiting their applicability in real-world tabular settings. To address these deficiencies, we present in this paper MissDDIM, a conditional diffusion framework that adapts Denoising Diffusion Implicit Models (DDIM) for tabular imputation. While stochastic sampling enables diverse completions, it also introduces output variability that complicates downstream processing. MissDDIM replaces this with a deterministic, non-Markovian sampling path, yielding faster and more consistent imputations. To better leverage incomplete inputs during training, we introduce a self-masking strategy that dynamically constructs imputation targets from observed features-enabling robust conditioning without requiring fully observed data. Experiments on five benchmark datasets demonstrate that MissDDIM matches or exceeds the accuracy of state-of-the-art diffusion models, while significantly improving inference speed and stability. These results highlight the practical value of deterministic diffusion for real-world imputation tasks.
-
A Scoping Review of Large Language Model Chatbot Use for Alcohol and Other Drug Health InformationNatasha Harding, Nataly Bovopoulos, Dotahn Caspi, Craig Martin, Skye McPhie, Mohamed Reda Bouadjenek, Sunil Aryal, and Michael HobbsDrug and Alcohol Review, Jan 2025ABSTRACT Issues While people prefer to seek alcohol and drug information (AOD) online, there can be quality and accessibility issues with these sources. Large Language Model (LLM) based chatbots are an emerging technology that may present an opportunity to overcome these barriers. We aimed to review the literature on the use of chatbots for seeking AOD health information, particularly the benefits, challenges and recommendations for future use. Approach Scoping review methodology was used to conduct a systematic search of four databases for English language studies relating to the use of chatbots to seek AOD health information in the last 5 years. This resulted in the screening of 243 articles, with five included studies. Key Findings There has been growing interest in the topic, though evidence is still limited. Despite identified benefits of chatbot use such as accuracy, appropriateness, overall experience and the provision of supporting documentation, important challenges in user safety concerns, lack of referral, quality, readability issues, and lack of adherence to current guidelines were noted, with mixed results regarding evidence-based responses. Only three of the five studies recommended chatbots for AOD-information seeking. Implications/Conclusion The current review suggests gaps in knowledge remain in the areas of accuracy, user safety, readability, evidence base and quality of LLM chatbot responses to AOD questions. More research is needed to investigate the applicability of LLMs in obtaining safe, non-stigmatising AOD information.
- KnoSys
Taxonomy-guided routing in capsule network for hierarchical image classificationKhondaker Tasrif Noor, Wei Luo, Antonio Robles-Kelly, Leo Yu Zhang, and Mohamed Reda BouadjenekKnowledge-Based Systems, Jan 2025Hierarchical multi-label classification in computer vision presents significant challenges in maintaining consistency across different levels of class granularity while capturing fine-grained visual details. This paper presents Taxonomy-aware Capsule Network (HT-CapsNet), a novel capsule network architecture that explicitly incorporates taxonomic relationships into its routing mechanism to address these challenges. Our key innovation lies in a taxonomy-aware routing algorithm that dynamically adjusts capsule connections based on known hierarchical relationships, enabling more effective learning of hierarchical features while enforcing taxonomic consistency. Extensive experiments on six benchmark datasets, including Fashion-MNIST, Marine-Tree, CIFAR-10, CIFAR-100, CUB-200-2011, and Stanford Cars, demonstrate that HT-CapsNet significantly outperforms existing methods across various hierarchical classification metrics. Notably, on CUB-200-2011, HT-CapsNet achieves absolute improvements of 10.32%, 10.2%, 10.3%, and 8.55% in hierarchical accuracy, F1-score, consistency, and exact match, respectively, compared to the best-performing baseline. On the Stanford Cars dataset, the model improves upon the best baseline by 21.69%, 18.29%, 37.34%, and 19.95% in the same metrics, demonstrating the robustness and effectiveness of our approach for complex hierarchical classification tasks.
2024
- TWeb
User Experience and the Role of Personalization in Critiquing-Based Conversational RecommendationArpit Rana, Scott Sanner, Mohamed Reda Bouadjenek, Ronald Di Carlantonio, and Gary FarmanerACM Trans. Web, Oct 2024Critiquing—where users propose directional preferences to attribute values—has historically been a highly popular method for conversational recommendation. However, with the growing size of catalogs and item attributes, it becomes increasingly difficult and time-consuming to express all of one’s constraints and preferences in the form of critiquing. It is found to be even more confusing in case of critiquing failures: when the system returns no matching items in response to user critiques. To this end, it would seem important to combine a critiquing-based conversational system with a personalized recommendation component to capture implicit user preferences and thus reduce the user’s burden of providing explicit critiques. To examine the impact of such personalization on critiquing, this article reports on a user study with 228 participants to understand user critiquing behavior for two different recommendation algorithms: (i) non-personalized, that recommends any item consistent with the user critiques; and (ii) personalized, which leverages a user’s past preferences on top of user critiques. In the study, we ask users to find a restaurant that they think is the most suitable to a given scenario by critiquing the recommended restaurants at each round of the conversation on the dimensions of price, cuisine, category, and distance. We observe that the non-personalized recommender leads to more critiquing interactions, more severe critiquing failures, overall more time for users to express their preferences, and longer dialogs to find their item of interest. We also observe that non-personalized users were less satisfied with the system’s performance. They find its recommendations less relevant, more unexpected, and somewhat equally diverse and surprising than those of personalized ones. The results of our user study highlight an imperative for further research on the integration of the two complementary components of personalization and critiquing to achieve the best overall user experience in future critiquing-based conversational recommender systems.
- IJCNN
MARKS-mech: A Mask-based Prior Knowledge Dissemination Mechanism for including Discourse Relations for Sentiment ClassificationShashank Gupta, Antonio Robles-Kelly, Mohamed Reda Bouadjenek, Asef Nazari, and Dhananjay ThiruvadyIn 2024 International Joint Conference on Neural Networks (IJCNN), Oct 2024 -
Shine: A deep learning-based accessible parking management systemDhiraj Neupane, Aashish Bhattarai, Sunil Aryal, Mohamed Reda Bouadjenek, Ukmin Seok, and Jongwon SeokExpert Systems with Applications, Oct 2024The ongoing expansion of urban areas facilitated by advancements in science and technology has resulted in a considerable increase in the number of privately owned vehicles worldwide, including in South Korea. However, this gradual increment in the number of vehicles has inevitably led to parking-related issues, including the abuse of disabled parking spaces (hereafter referred to as accessible parking spaces) designated for individuals with disabilities. Traditional license plate recognition (LPR) systems have proven inefficient in addressing such a problem in real-time due to the high frame rate of surveillance cameras, the presence of natural and artificial noise, and variations in lighting and weather conditions that impede detection and recognition by these systems. With the growing concept of parking 4.0, many sensors, IoT and deep learning-based approaches have been applied to automatic LPR and parking management systems. Nonetheless, the studies show a need for a robust and efficient model for managing accessible parking spaces in South Korea. To address this, we have proposed a novel system called, ‘Shine’, which uses the deep learning-based object detection algorithm for detecting the vehicle, license plate, and disability badges (referred to as cards, badges, or access badges hereafter) and verifies the rights of the driver to use accessible parking spaces by coordinating with the central server. Our model, which achieves a mean average precision of 92.16%, is expected to address the issue of accessible parking space abuse and contributes significantly towards efficient and effective parking management in urban environments.
- PAKDD
MLT-Trans: Multi-level Token Transformer for Hierarchical Image ClassificationTanya Boone Sifuentes, Asef Nazari, Mohamed Reda Bouadjenek, and Imran RazzakIn Advances in Knowledge Discovery and Data Mining, Oct 2024This paper focuses on Multi-level Hierarchical Classification (MLHC) of images, presenting a novel architecture that exploits the “[CLS]” (classification) token within transformers – often disregarded in computer vision tasks. Our primary goal lies in utilizing the information of every [CLS] token in a hierarchical manner. Toward this aim, we introduce a Multi-level Token Transformer (MLT-Trans). This model, trained with sharpness-aware minimization and a hierarchical loss function based on knowledge distillation is capable of being adapted to various transformer-based networks, with our choice being the Swin Transformer as the backbone model. Empirical results across diverse hierarchical datasets confirm the efficacy of our approach. The findings highlight the potential of combining transformers and [CLS] tokens, by demonstrating improvements in hierarchical evaluation metrics and accuracy up to 5.7% on the last level in comparison to the base network, thereby supporting the adoption of the MLT-Trans framework in MLHC.
-
The STOIC2021 COVID-19 AI challenge: Applying reusable training methodologies to private dataLuuk H. Boulogne, Julian Lorenz, Daniel Kienzle, Robin Schön, Katja Ludwig, Rainer Lienhart, Simon Jégou, Guang Li, and 30 more authorsMedical Image Analysis, Oct 2024Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.
-
A consistency-aware deep capsule network for hierarchical multi-label image classificationKhondaker Tasrif Noor, Antonio Robles-Kelly, Leo Yu Zhang, Mohamed Reda Bouadjenek, and Wei LuoNeurocomputing, Oct 2024Hierarchical classification is a significant challenge in computer vision due to the logical order and interconnectedness of multiple labels. This paper presents HD-CapsNet, a novel neural network architecture based on deep capsule networks, specifically designed for hierarchical multi-label classification(HMC). By incorporating a tree-like hierarchical structure, HD-CapsNet is designed to leverage the inherent ontological order within the hierarchical label tree, thereby ensuring classification consistency across different levels. Additionally, we introduce a specialized loss function that promotes accurate hierarchical relationships while penalizing inconsistencies. This not only enhances classification performance but also strengthens the network’s robustness. We rigorously evaluate HD-CapsNet’s efficacy by benchmarking it against existing HMC methods across six diverse datasets: Fashion-MNIST, Marine-Tree, CIFAR-10, CIFAR-100, Caltech-UCSD Birds-200-2011, and Stanford Cars. Our results conclusively demonstrate that HD-CapsNet excels in learning hierarchical relationships and significantly outperforms the competition in various image classification tasks. Our implementation is available at https://github.com/tasrif-khondaker/HD-CapsNet.
- ECML
Missing Data Imputation: Do Advanced ML/DL Techniques Outperform Traditional Approaches?Youran Zhou, Mohamed Reda Bouadjenek, and Sunil AryalIn Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track, Oct 2024Missing data poses a significant challenge in real-world data analysis, prompting the development of various imputation methods. However, existing literature often overlooks two critical limitations. Firstly, many methods assume a Missing Completely At Random (MCAR) mechanism, which is relatively easy to handle but may not reflect real-world scenarios where data is often missing due to some underlying mechanisms (issues/problems) that are often unknown. This type of missing data is categorized as Missing At Random (MAR) and Missing Not At Random (MNAR). Secondly, the effectiveness of these methods is primarily assessed solely in terms of imputation accuracy using metrics such as Root Mean Square Error (RMSE), ignoring the practical utility of imputed data in downstream tasks. In this study, we comprehensively compare a broad spectrum of missing data imputation techniques, ranging from traditional statistical methods to advanced machine and deep learning approaches. Our evaluation considers their effectiveness in handling various missing mechanisms across different missing parameters. Furthermore, we assess the imputed data’s quality not only in terms of RMSE but also its impact on downstream tasks, such as classification, regression, and clustering. Contrary to common assumptions, our findings reveal that the superiority of complex deep learning-based methods is not guaranteed over simple traditional techniques. Moreover, relying solely on RMSE for evaluation can be misleading. Instead, selecting an imputation method should prioritise its effectiveness in enhancing the performance of learning algorithms in downstream tasks.
- CIKM
Covid19-twitter: A Twitter-based Dataset for Discourse Analysis in Sentence-level Sentiment ClassificationShashank Gupta, Mohamed Reda Bouadjenek, Antonio Robles-Kelly, Tsz-Kwan Lee, Thanh Thi Nguyen, Asef Nazari, and Dhananjay ThiruvadyIn Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, Oct 2024For the sentence-level sentiment classification task, learning Contrastive Discourse Relations (CDRs) like a-but-b is difficult for Deep Neural Networks (DNNs) via purely data-driven training. Several methods exist in the literature for dissemination of CDR information with DNNs, but there is no dedicated dataset available to effectively test their dissemination performance. In this paper, we propose a new large-scale dataset for this purpose called Covid19-twitter, which contains around 100k tweets symmetrically divided into various categories. Instead of manual annotation, we used a combination of an Emoji analysis and a lexicon-based tool called Valence Aware Dictionary and sEntiment Reasoner (VADER) to perform automatic labelling of the tweets, while also ensuring high accuracy of the annotation process through some quality checks. We also provide benchmark performances of several baselines on our dataset for both the sentiment classification and CDR dissemination tasks. We believe that this dataset will be valuable for discourse analysis research in sentiment classification.
2023
-
Multilevel depth-wise context attention network with atrous mechanism for segmentation of COVID19 affected regionsAbdul Qayyum, Mona Mazhar, Imran Razzak, and Mohamed Reda BouadjenekNeural Computing and Applications, Oct 2023Severe acute respiratory syndrome coronavirus (SARS-CoV-2) also named COVID-19, aggressively spread all over the world in just a few months. Since then, it has multiple variants that are far more contagious than its parent. Rapid and accurate diagnosis of COVID-19 and its variants are crucial for its treatment, analysis of lungs damage and quarantine management. Deep learning-based solution for efficient and accurate diagnosis to COVID-19 and its variants using Chest X-rays, and computed tomography images could help to counter its outbreak. This work presents a novel depth-wise residual network with an atrous mechanism for accurate segmentation and lesion location of COVID-19 affected areas using volumetric CT images. The proposed framework consists of 3D depth-wise and 3D residual squeeze and excitation block in cascaded and parallel to capture uniformly multi-scale context (low-level detailed, mid-level comprehensive and high-level rich semantic features). The squeeze and excitation block adaptively recalibrates channel-wise feature responses by explicitly modeling inter-dependencies between various channels. We further have introduced an atrous mechanism with a different atrous rate as the bottom layer. Extensive experiments on benchmark CT datasets showed considerable gain (5%) for accurate segmentation and lesion location of COVID-19 affected areas.
- ECIR
A Mask-Based Logic Rules Dissemination Method for Sentiment ClassifiersShashank Gupta, Mohamed Reda Bouadjenek, and Antonio Robles-KellyIn Advances in Information Retrieval, Oct 2023Disseminating and incorporating logic rules inspired by domain knowledge in Deep Neural Networks (DNNs) is desirable to make their output causally interpretable, reduce data dependence, and provide some human supervision during training to prevent undesirable outputs. Several methods have been proposed for that purpose but performing end-to-end training while keeping the DNNs informed about logical constraints remains a challenging task. In this paper, we propose a novel method to disseminate logic rules in DNNs for Sentence-level Binary Sentiment Classification. In particular, we couple a Rule-Mask Mechanism with a DNN model which given an input sequence predicts a vector containing binary values corresponding to each token that captures if applicable a linguistically motivated logic rule on the input sequence. We compare our method with a number of state-of-the-art baselines and demonstrate its effectiveness. We also release a new Twitter-based dataset specifically constructed to test logic rule dissemination methods and propose a new heuristic approach to provide automatic high-quality labels for the dataset.
- IPM
Towards understanding and mitigating unintended biases in language model-driven conversational recommendationTianshu Shen, Jiaru Li, Mohamed Reda Bouadjenek, Zheda Mai, and Scott SannerInformation Processing & Management, Oct 2023Conversational Recommendation Systems (CRSs) have recently started to leverage pretrained language models (LM) such as BERT for their ability to semantically interpret a wide range of preference statement variations. However, pretrained LMs are prone to intrinsic biases in their training data, which may be exacerbated by biases embedded in domain-specific language data (e.g., user reviews) used to fine-tune LMs for CRSs. We study a simple LM-driven recommendation backbone (termed LMRec) of a CRS to investigate how unintended bias — i.e., bias due to language variations such as name references or indirect indicators of sexual orientation or location that should not affect recommendations — manifests in substantially shifted price and category distributions of restaurant recommendations. For example, offhand mention of names associated with the black community substantially lowers the price distribution of recommended restaurants, while offhand mentions of common male-associated names lead to an increase in recommended alcohol-serving establishments. While these results raise red flags regarding a range of previously undocumented unintended biases that can occur in LM-driven CRSs, there is fortunately a silver lining: we show that train side masking and test side neutralization of non-preferential entities nullifies the observed biases without significantly impacting recommendation performance.
- PR
Overcoming weaknesses of density peak clustering using a data-dependent similarity measureZafaryab Rasool, Sunil Aryal, Mohamed Reda Bouadjenek, and Richard DazeleyPattern Recognition, Oct 2023Density Peak Clustering (DPC) is a popular state-of-the-art clustering algorithm, which requires pairwise (dis)similarity of data objects to detect arbitrary shaped clusters. While it is shown to perform well for many applications, DPC remains: (i) not robust for datasets with clusters having different densities, and (ii) sensitive to the change in the units/scales used to represent data. These drawbacks are mainly due to the use of the data-independent similarity measure based on the Euclidean distance. In this paper, we address these issues by proposing an effective data-dependent similarity measure based on Probability Mass, which we call MP-Similarity, and by incorporating it in DPC to create MP-DPC, a data-dependent variant of DPC. We evaluate and compare MP-DPC against diverse baselines using several clustering metrics and datasets. Our experiments demonstrate that: (a) MP-DPC produces better clustering results than DPC using the Euclidean distance and existing data-dependent similarity measures; (b) MP-Similarity coupled with Shared-Nearest-Neighbor-based density metric in DPC further enhances the quality of clustering results; and (c) unlike DPC with existing data-independent and data-dependent similarity measures, MP-DPC is robust to the change in the units/scales used to represent data. Our findings suggest that MP-Similarity provides a more viable solution for DPC in datasets with unknown distribution or units/scales of features, which is often the case in many real-world applications.
- TWEB
A User-Centric Analysis of Social Media for Stock Market PredictionMohamed Reda Bouadjenek, Scott Sanner, and Ga WuACM Trans. Web, Mar 2023Social media platforms such as Twitter or StockTwits are widely used for sharing stock market opinions between investors, traders, and entrepreneurs. Empirically, previous work has shown that the content posted on these social media platforms can be leveraged to predict various aspects of stock market performance. Nonetheless, actors on these social media platforms may not always have altruistic motivations and may instead seek to influence stock trading behavior through the (potentially misleading) information they post. While a lot of previous work has sought to analyze how social media can be used to predict the stock market, there remain many questions regarding the quality of the predictions and the behavior of active users on these platforms. To this end, this article seeks to address a number of open research questions: Which social media platform is more predictive of stock performance? What posted content is actually predictive, and over what time horizon? How does stock market posting behavior vary among different users? Are all users trustworthy or do some user’s predictions consistently mislead about the true stock movement? To answer these questions, we analyzed data from Twitter and StockTwits covering almost 5 years of posted messages spanning 2015 to 2019. The results of this large-scale study provide a number of important insights among which we present the following: (i) StockTwits is a more predictive source of information than Twitter, leading us to focus our analysis on StockTwits; (ii) on StockTwits, users’ self-labeled sentiments are correlated with the stock market but are only slightly predictive in aggregate over the short-term; (iii) there are at least three clear types of temporal predictive behavior for users over a 144 days horizon: short, medium, and long term; and (iv) consistently incorrect users who are reliably wrong tend to exhibit what we conjecture to be “botlike” post content and their removal from the data tends to improve stock market predictions from self-labeled content.
- KnoSys
PERCY: A post-hoc explanation-based score for logic rule dissemination consistency assessment in sentiment classificationShashank Gupta, Mohamed Reda Bouadjenek, and Antonio Robles-KellyKnowledge-Based Systems, Mar 2023Disseminating and incorporating logic rules into deep neural networks has been extensively explored for sentiment classification in recent years. In particular, most methods and algorithms proposed for this purpose rely on a specific component that aims to capture and model logic rules, followed by a sequence model to process the input sequence. While the authors of these methods claim that they effectively capture syntactic structures that affect sentiment classification, they only show improvement in accuracy to support their claims without further analysis. Focusing on various syntactic structures, particularly contrastive discourse relations such as the A-but-B structure, we introduce the PERCY score, a novel Post-hoc Explanation-based Rule ConsistencY Score to analyze and study the ability of several of these methods to identify these structures in a given sentence, and to make their classification decisions based on the appropriate conjunct. Specifically, we explore the use of model-agnostic post-hoc explanation frameworks to explain the predictions of any classifier in an interpretable and faithful manner. These model explainability frameworks provide feature attribution scores to estimate each word’s impact on the final classification decision. Then, they are combined to check whether the model has based its decision on the right conjunct. Our experiments show that (a) accuracy – or any other performance metric – can be misleading in assessing the ability of logic rule dissemination methods to base their decisions on the right conjunct, (b) not all analyzed methods effectively capture syntactic structures, (c) often, the underlying sequence model is what captures the structure, and (d) for the best method, less than 25% of the test examples are classified based on the appropriate conjunct, indicating that a lot of research needs to be done on this topic. Finally, we experimentally demonstrate that the PERCY scores calculated are robust and stable w.r.t. the feature-attribution frameworks used.
2022
-
ModelOps for enhanced decision-making and governance in emergency control roomsKay Lefevre, Chetan Arora, Kevin Lee, Arkady Zaslavsky, Mohamed Reda Bouadjenek, Ali Hassani, and Imran RazzakEnvironment Systems and Decisions, Mar 2022For mission critical (MC) applications such as bushfire emergency management systems (EMS), understanding the current situation as a disaster unfolds is critical to saving lives, infrastructure and the environment. Incident control-room operators manage complex information and systems, especially with the emergence of Big Data. They are increasingly making decisions supported by artificial intelligence (AI) and machine learning (ML) tools for data analysis, prediction and decision-making. As the volume, speed and complexity of information increases due to more frequent fire events, greater availability of myriad IoT sensors, smart devices, satellite data and burgeoning use of social media, the advances in AI and ML that help to manage Big Data and support decision-making are increasingly perceived as “Black Box”. This paper aims to scope the requirements for bushfire EMS to improve Big Data management and governance of AI/ML. An analysis of ModelOps technology, used increasingly in the commercial sector, is undertaken to determine what components might be fit-for-purpose. The result is a novel set of ModelOps features, EMS requirements and an EMS-ModelOps framework that resolves more than 75% of issues whilst being sufficiently generic to apply to other types of mission-critical applications.
-
Mutliresolutional ensemble PartialNet for Alzheimer detection using magnetic resonance imaging dataImran Razzak, Saeeda Naz, Abida Ashraf, Fahmi Khalifa, Mohamed Reda Bouadjenek, and Shahid MumtazInternational Journal of Intelligent Systems, Mar 2022Abstract Alzheimer’s disease (AD) is an irreversible and progressive disorder where a large number of brain cells and their connections degenerate and die, eventually destroy the memory and other important mental functions that affect memory, thinking, language, judgment, and behavior. Not a single test can effectively determine AD; however, CT and magnetic resonance imaging (MRI) can be used to observe the decrease in size of different areas (mainly temporal and parietal lobes). This paper proposes an integrative deep ensemble learning framework to obtain better predictive performance for AD diagnosis. Unlike DenseNet, we present a multiresolutional ensemble PartialNet tailored to Alzheimer detection using brain MRIs. PartialNet incorporates the properties of identity mappings, diversified depth as well as deep supervision, thus, considers feature reuse that in turn results in better learning. Additionally, the proposed ensemble PartialNet demonstrates better characteristics in terms of vanishing gradient, diminishing forward flow with better training time, and a low number of parameters compared with DenseNet. Experiments performed on benchmark AD neuroimaging initiative data set that showed considerable performance gain (2 + % \↑\) and (1.2 + % \↑\) for multiclass and binary class in AD detection in comparison to state-of-the-art methods.
- MDM
Jarvis: A Voice-based Context-as-a-Service Mobile Tool for a Smart Home EnvironmentNgoc Dung Huynh, Mohamed Reda Bouadjenek, Ali Hassani, Imran Razzak, Kevin Lee, Chetan Arora, and Arkady ZaslavskyIn 2022 23rd IEEE International Conference on Mobile Data Management (MDM), Mar 2022Best Demo Award
- PeerJ
A longitudinal study of topic classification on TwitterMohamed Reda Bouadjenek, Scott Sanner, Zahra Iman, Lexing Xie, and Daniel Xiaoliang ShiPeerJ Computer Science, Mar 2022Twitter represents a massively distributed information source over topics ranging from social and political events to entertainment and sports news. While recent work has suggested this content can be narrowed down to the personalized interests of individual users by training topic filters using standard classifiers, there remain many open questions about the efficacy of such classification-based filtering approaches. For example, over a year or more after training, how well do such classifiers generalize to future novel topical content, and are such results stable across a range of topics? In addition, how robust is a topic classifier over the time horizon, e.g., can a model trained in 1 year be used for making predictions in the subsequent year? Furthermore, what features, feature classes, and feature attributes are most critical for long-term classifier performance? To answer these questions, we collected a corpus of over 800 million English Tweets via the Twitter streaming API during 2013 and 2014 and learned topic classifiers for 10 diverse themes ranging from social issues to celebrity deaths to the “Iran nuclear deal”. The results of this long-term study provide insights including: (i) classifiers generalize to novel topical content with high precision though with degradation over time, (ii) hashtags and simple terms are the most informative features, (iii) removing tweets containing training hashtags improves generalization, and (iv) tweet volume by user correlates with informativeness more than follower count.
- CLEF
An Analysis of Logic Rule Dissemination in Sentiment ClassifiersShashank Gupta, Mohamed Reda Bouadjenek, and Antonio Robles-KellyIn Experimental IR Meets Multilinguality, Multimodality, and Interaction, Mar 2022Disseminating and incorporating logic rules in deep neural networks has been extensively explored for sentiment classification. Methods that are proposed for that goal rely on a component that aims to capture and model logic rules, followed by a sequence model to process the input sequence. While these methods claim to effectively capture syntactic structures that affect sentiment, they only show improvement in terms of accuracy to support their claims with no further analysis. Focusing on the A-but-B rule, we use the PERCY metric (a recently developed Post-hoc Explanation-based score for logic Rule dissemination ConsistencY assessment) to analyze and study the ability of these methods to identify the A-but-B structure, and to make their classification decision based on the B conjunct. PERCY proceeds by estimating feature attribution scores using LIME, a model-agnostic framework that aims to explain the predictions of any classifier in an interpretable and faithful manner. Our experiments show that (a) accuracy is misleading in assessing these methods, (b) not all these methods are effectively capturing the A-but-B structure, (c) often, the underlying sequence model is what captures the syntactic structure, and (d) the best method classifies less than 25% of test examples based on the B conjunct.
- SIGIR
Mitigating the Filter Bubble While Maintaining Relevance: Targeted Diversification with VAE-based Recommender SystemsZhaolin Gao, Tianshu Shen, Zheda Mai, Mohamed Reda Bouadjenek, Isaac Waller, Ashton Anderson, Ron Bodkin, and Scott SannerIn Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, Mar 2022Online recommendation systems are prone to create filter bubbles, whereby users are only recommended content narrowly aligned with their historical interests. In the case of media recommendation, this can reinforce political polarization by recommending topical content (e.g., on the economy) at one extreme end of the political spectrum even though this topic has broad coverage from multiple political viewpoints that would provide a more balanced and informed perspective for the user. Historically, Maximal Marginal Relevance (MMR) has been used to diversify result lists and even mitigate filter bubbles, but suffers from three key drawbacks: (1) MMR directly sacrifices relevance for diversity, (2) MMR typically diversifies across all content and not just targeted dimensions (e.g., political polarization), and (3) MMR is inefficient in practice due to the need to compute pairwise similarities between recommended items. To simultaneously address these limitations, we propose a novel methodology that trains Concept Activation Vectors (CAVs) for targeted topical dimensions (e.g., political polarization). We then modulate the latent embeddings of user preferences in a state-of-the-art VAE-based recommender system to diversify along the targeted dimension while preserving topical relevance across orthogonal dimensions. Our experiments show that our Targeted Diversification VAE-based Collaborative Filtering (TD-VAE-CF) methodology better preserves relevance of content to user preferences across a range of diversification levels in comparison to both untargeted and targeted variations of Maximum Marginal Relevance (MMR); TD-VAE-CF is also much more computationally efficient than the post-hoc re-ranking approach of MMR.
- CIKM
Marine-tree: A Large-scale Marine Organisms Dataset for Hierarchical Image ClassificationTanya Boone-Sifuentes, Asef Nazari, Imran Razzak, Mohamed Reda Bouadjenek, Antonio Robles-Kelly, Daniel Ierodiaconou, and Elizabeth S. OhIn Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, Mar 2022This paper presents Marine-tree, a large-scale hierarchical annotated dataset for marine organism classification. Marine-tree contains more than 160k annotated images divided into 60 classes organised in a hierarchy-tree structure using an adapted CATAMI (Collaborative and Automated Tools for the Analysis of Marine Imagery and video) classification scheme. Images were meticulously collected by scuba divers using the RLS (Reef Life Survey) methodology and later annotated by experts in the field. We also propose a hierarchical loss function that can be applied to any multi-level hierarchical classification model, which takes into account the parent-child relationship between predictions and uses it to penalize inconsistent predictions. Experimental results demonstrate thatMarine-tree and the proposed hierarchical loss function are a good contribution for both research in underwater imagery and hierarchical classification.
- CIKM
A Mask-based Output Layer for Multi-level Hierarchical ClassificationTanya Boone-Sifuentes, Mohamed Reda Bouadjenek, Imran Razzak, Hakim Hacid, and Asef NazariIn Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, Mar 2022This paper proposes a novel mask-based output layer for multi-level hierarchical classification, addressing the limitations of existing methods which (i) often do not embed the taxonomy structure being used, (ii) use a complex backbone neural network with n disjoint output layers that do not constraint each other, (iii) may output predictions that are often inconsistent with the taxonomy in place, and (iv) have often a fixed value of n. Specifically, we propose a model agnostic output layer that embeds the taxonomy and that can be combined with any model. Our proposed output layer implements a top-down divide-and-conquer strategy through a masking mechanism to enforce that predictions comply with the embedded hierarchy structure. Focusing on image classification, we evaluate the performance of our proposed output layer on three different datasets, each with a three-level hierarchical structure. Experiments on these datasets show that our proposed mask-based output layer allows to improve several multi-level hierarchical classification models using various performance metrics.
2021
- SSPR
Feature Extraction Functions for Neural Logic Rule LearningShashank Gupta, Antonio Robles-Kelly, and Mohamed Reda BouadjenekIn Structural, Syntactic, and Statistical Pattern Recognition, Mar 2021Combining symbolic human knowledge with neural networks provides a rule-based ante-hoc explanation of the output. In this paper, we propose feature extracting functions for integrating human knowledge abstracted as logic rules into the predictive behaviour of a neural network. These functions are embodied as programming functions, which represent the applicable domain knowledge as a set of logical instructions and provide a modified distribution of independent features on input data. Unlike other existing neural logic approaches, the programmatic nature of these functions implies that they do not require any kind of special mathematical encoding, which makes our method very general and flexible in nature. We illustrate the performance of our approach for sentiment classification and compare our results to those obtained using two baselines.
-
Linked Data Triples Enhance Document Relevance ClassificationDinesh Nagumothu, Peter W. Eklund, Bahadorreza Ofoghi, and Mohamed Reda BouadjenekApplied Sciences, Mar 2021Standardized approaches to relevance classification in information retrieval use generative statistical models to identify the presence or absence of certain topics that might make a document relevant to the searcher. These approaches have been used to better predict relevance on the basis of what the document is “about”, rather than a simple-minded analysis of the bag of words contained within the document. In more recent times, this idea has been extended by using pre-trained deep learning models and text representations, such as GloVe or BERT. These use an external corpus as a knowledge-base that conditions the model to help predict what a document is about. This paper adopts a hybrid approach that leverages the structure of knowledge embedded in a corpus. In particular, the paper reports on experiments where linked data triples (subject-predicate-object), constructed from natural language elements are derived from deep learning. These are evaluated as additional latent semantic features for a relevant document classifier in a customized news-feed website. The research is a synthesis of current thinking in deep learning models in NLP and information retrieval and the predicate structure used in semantic web research. Our experiments indicate that linked data triples increased the F-score of the baseline GloVe representations by 6% and show significant improvement over state-of-the art models, like BERT. The findings are tested and empirically validated on an experimental dataset and on two standardized pre-classified news sources, namely the Reuters and 20 News groups datasets.
- WWW
A Workflow Analysis of Context-driven Conversational RecommendationShengnan Lyu, Arpit Rana, Scott Sanner, and Mohamed Reda BouadjenekIn Proceedings of the Web Conference 2021, Ljubljana, Slovenia, Mar 2021A number of recent works have made seminal contributions to the understanding of user intent and recommender interaction in conversational recommendation. However, to date, these studies have not focused explicitly on context-driven interaction that underlies the typical use of more pervasive Question Answering (QA) focused conversational assistants like Amazon Alexa, Apple Siri, and Google Assistant. In this paper, we aim to understand a general workflow of natural context-driven conversational recommendation that arises from a pairwise study of a human user interacting with a human simulating the role of a recommender. In our analysis of this intrinsically organic human-to-human conversation, we observe a clear structure of interaction workflow consisting of a preference elicitation and refinement stage, followed by inquiry and critiquing stages after the first recommendation. To better understand the nature of these stages and the conversational flow within them, we augment existing taxonomies of intent and action to label all interactions at each stage and analyze the workflow. From this analysis, we identify distinct conversational characteristics of each stage, e.g., (i) the preference elicitation stage consists of significant iteration to clarify, refine, and obtain a mutual understanding of preferences, (ii) the inquiry and critiquing stage consists of extensive informational queries to understand features of the recommended item and to (implicitly) specify critiques, and (iii) explanation appears to drive a substantial portion of the post-recommendation interaction, suggesting that beyond the purpose of justification, explanation serves a critical role to direct the evolving conversation itself. Altogether, we contribute a novel qualitative and quantitative analysis of workflow in conversational recommendation that further refines our existing understanding of this important frontier of conversational systems and suggests a number of critical avenues for further research to better automate natural recommendation conversations.
2020
-
Evaluation of Machine Learning Algorithms for Predicting Readmission After Acute Myocardial Infarction Using Routinely Collected Clinical DataShagun Gupta, Dennis T. Ko, Paymon Azizi, Mohamed Reda Bouadjenek, Maria Koh, Alice Chong, Peter C. Austin, and Scott SannerCanadian Journal of Cardiology, Mar 2020Background The ability to predict readmission accurately after hospitalization for acute myocardial infarction (AMI) is limited in current statistical models. Machine-learning (ML) methods have shown improved predictive ability in various clinical contexts, but their utility in predicting readmission after hospitalization for AMI is unknown. Methods Using detailed clinical information collected from patients hospitalized with AMI, we evaluated 6 ML algorithms (logistic regression, naïve Bayes, support vector machines, random forest, gradient boosting, and deep neural networks) to predict readmission within 30 days and 1 year of discharge. A nested cross-validation approach was used to develop and test models. We used C-statistics to compare discriminatory capacity, whereas the Brier score was used to indicate overall model performance. Model calibration was assessed using calibration plots. Results The 30-day readmission rate was 16.3%, whereas the 1-year readmission rate was 45.1%. For 30-day readmission, the discriminative ability for the ML models was modest (C-statistic 0.641; 95% confidence interval (CI), 0.621-0.662 for gradient boosting) and did not outperform previously reported methods. For 1-year readmission, different ML models showed moderate performance, with C-statistics around 0.72. Despite modest discriminatory capabilities, the observed readmission rates were markedly higher in the tenth decile of predicted risk compared with the first decile of predicted risk for both 30-day and 1-year readmission. Conclusions Despite including detailed clinical information and evaluating various ML methods, these models did not have better discriminatory ability to predict readmission outcomes compared with previously reported methods. Résumé Contexte Les modèles statistiques actuels ne permettent pas de prédire avec exactitude la réadmission après une hospitalisation pour cause d’infarctus aigu du myocarde (IAM). Les méthodes de prédiction faisant appel à l’apprentissage automatique ont été associées à une amélioration de la capacité de prédiction dans divers contextes cliniques, mais leur utilité pour prédire la réadmission après une hospitalisation pour cause d’IAM demeure inconnue. Méthodologie À l’aide de données cliniques détaillées recueillies auprès de patients hospitalisés pour un IAM, nous avons évalué six algorithmes d’apprentissage automatique (régression logistique, classification naïve bayésienne, machine à vecteurs de support, forêt aléatoire, boosting par descente de gradient fonctionnelle et réseaux neuronaux d’apprentissage profond) pour prédire la réadmission dans les 30 jours et dans l’année suivant la sortie de l’hôpital. Les modèles ont été mis au point et testés à l’aide d’une approche de validation croisée imbriquée. Nous avons utilisé la statistique C pour comparer la capacité de discrimination des différents modèles, et le score de Brier pour en chiffrer le rendement global. Le calage des modèles a été évalué au moyen de courbes d’étalonnage. Résultats Le taux de réadmission à 30 jours était de 16,3 %, tandis que le taux de réadmission à 1 an était de 45,1 %. Dans le cas de la réadmission à 30 jours, la capacité de discrimination des modèles d’apprentissage automatique était modeste (statistique C : 0,641; intervalle de confiance [IC] à 95 % : 0,621-0,662 pour le boosting par descente de gradient fonctionnelle) et n’était pas supérieure à celle des méthodes déjà utilisées. Dans le cas de la réadmission à 1 an, différents modèles d’apprentissage automatique se sont révélés modérément efficaces, la statistique C se chiffrant à environ 0,72. En dépit des modestes capacités de discrimination des différentes méthodes, les taux de réadmission observés étaient nettement plus élevés dans le dixième décile du risque prédit comparativement à ceux du premier décile, pour la réadmission à 30 jours comme pour la réadmission à 1 an. Conclusions Malgré le recours à des données cliniques détaillées et à différentes méthodes d’apprentissage automatique, les modèles évalués n’ont pas montré une capacité de discrimination supérieure à celle des méthodes déjà utilisées pour prédire la réadmission.
- InfoSys
Relevance- and interface-driven clustering for visual information retrievalMohamed Reda Bouadjenek, Scott Sanner, and Yihao DuInformation Systems, Mar 2020Search results of spatio-temporal data are often displayed on a map, but when the number of matching search results is large, it can be time-consuming to individually examine all results, even when using methods such as filtered search to narrow the content focus. This suggests the need to aggregate results via a clustering method. However, standard unsupervised clustering algorithms like K-means (i) ignore relevance scores that can help with the extraction of highly relevant clusters, and (ii) do not necessarily optimize search results for purposes of visual presentation. In this article, we address both deficiencies by framing the clustering problem for search-driven user interfaces in a novel optimization framework that (i) aims to maximize the relevance of aggregated content according to cluster-based extensions of standard information retrieval metrics and (ii) defines clusters via constraints that naturally reflect interface-driven desiderata of spatial, temporal, and keyword coherence that do not require complex ad-hoc distance metric specifications as in K-means. After comparatively benchmarking algorithmic variants of our proposed approach – RadiCAL – in offline experiments, we undertake a user study with 24 subjects to evaluate whether RadiCAL improves human performance on visual search tasks in comparison to K-means clustering and a filtered search baseline. Our results show that (a) our binary partitioning search (BPS) variant of RadiCAL is fast, near-optimal, and extracts higher-relevance clusters than K-means, and (b) clusters optimized via RadiCAL result in faster search task completion with higher accuracy while requiring a minimum workload leading to high effectiveness, efficiency, and user satisfaction among alternatives.
2019
-
Personalized Social Query Expansion Using Social AnnotationsMohamed Reda Bouadjenek, Hakim Hacid, and Mokrane BouzeghoubMar 2019Query expansion is a query pre-processing technique that adds to a given query, terms that are likely to occur in relevant documents in order to improve information retrieval accuracy. A key problem to solve is “how to identify the terms to be added to a query?” While considering social tagging systems as a data source, we propose an approach that selects terms based on (i) the semantic similarity between tags composing a query, (ii) a social proximity between the query and the user for a personalized expansion, and (iii) a strategy for expanding, on the fly, user queries. We demonstrate the effectiveness of our approach by an intensive evaluation on three large public datasets crawled from delicious, Flickr, and CiteULike. We show that the expanded queries built by our method provide more accurate results as compared to the initial queries, by increasing the MAP in a range of 10 to 16% on the three datasets. We also compare our method to three state of the art baselines, and we show that our query expansion method allows significant improvement in the MAP, with a boost in a range between 5 to 18%.
- CHIIR
Relevance-driven Clustering for Visual Information Retrieval on TwitterMohamed Reda Bouadjenek and Scott SannerIn Proceedings of the 2019 Conference on Human Information Interaction and Retrieval, Glasgow, Scotland UK, Mar 2019Geo-temporal visualization of Twitter search results is a challenging task since the simultaneous display of all matching tweets would result in a saturated and unreadable display. In such settings, clustering search results can assist users to scan only a few coherent groups of related tweets rather than many individual tweets. However, in practice, the use of unsupervised clustering methods such as K -Means does not necessarily guarantee that the clusters themselves are relevant. Therefore, we develop a novel method of relevance-driven clustering for visual information retrieval to supply users with highly relevant clusters representing different information perspectives of their queries. We specifically propose a Visual Twitter Information Retrieval (Viz-TIR) tool for relevance-driven clustering and ranking of Twitter search results. At the heart of Viz-TIR is a fast greedy algorithm that optimizes an approximation of an expected F1-Score metric to generate these clusters. We demonstrate its effectiveness w.r.t. K -Means and a baseline method that shows all top matching results on a scenario related to searching natural disasters in US-based Twitter data spanning 2013 and 2014. Our demo shows that Viz-TIR is easy to use and more precise in extracting geo-temporally coherent clusters given search queries in comparison to K-Means, thus aiding the user in visually searching and browsing social network content. Overall, we believe this work enables new opportunities for the synthesis of information retrieval as well as combined relevance and display-aware optimization techniques to support query-adaptive visual information exploration interfaces.
-
Automated assessment of biological database assertions using the scientific literatureMohamed Reda Bouadjenek, Justin Zobel, and Karin VerspoorBMC Bioinformatics, Mar 2019The large biological databases such as GenBank contain vast numbers of records, the content of which is substantively based on external resources, including published literature. Manual curation is used to establish whether the literature and the records are indeed consistent. We explore in this paper an automated method for assessing the consistency of biological assertions, to assist biocurators, which we call BARC, Biocuration tool for Assessment of Relation Consistency. In this method a biological assertion is represented as a relation between two objects (for example, a gene and a disease); we then use our novel set-based relevance algorithm SaBRA to retrieve pertinent literature, and apply a classifier to estimate the likelihood that this relation (assertion) is correct.
- SDM
A Novel Regularizer for Temporally Stable Learning with an Application to Twitter Topic ClassificationYakun Wang, Ga Wu, Mohamed Reda Bouadjenek, Scott Sanner, Sen Su, and Zhongbao ZhangMar 2019Abstract Supervised topic classifiers for Twitter and other media sources are important in a variety of long-term topic tracking tasks. Unfortunately, over long periods of time, features that are predictive during the training period may prove ephemeral and fail to generalize to prediction at future times. For example, if we trained a classifier to identify tweets concerning the topic of “Celebrity Death”, individual celebrity names and terms associated with these celebrities such as “Nelson Mandela” or “South Africa” would prove to be temporally unstable since they would not generalize over long periods of time; in contrast, terms like “RIP” (rest in peace) would prove to be temporally stable predictors of this topic over long periods of time. In this paper, we aim to design supervised learning methods for Twitter topic classifiers that are capable of automatically downweighting temporally unstable features to improve future generalization. To do this, we first begin with an oracular approach that chooses temporally stable features based on knowledge of both train and test data labels. We then search for feature metrics evaluated on only the training data that are capable of recovering the temporally stable features identified by our oracular definition. We next embed the top-performing metric as a temporal stability regularizer in logistic regression with the important property that the overall training objective retains convexity, hence enabling a globally optimal solution. Finally, we train our topic classifiers on 6 Twitter topics over roughly one year of data and evaluate on the following year of data, showing that logistic regression with our temporal stability regularizer generally outperforms logistic regression without such regularization across the full precision-recall continuum. Overall, these results establish a novel regularizer for training long-term temporally stable topic classifiers for Twitter and beyond.
- SIGIR
One-Class Collaborative Filtering with the Queryable Variational AutoencoderGa Wu, Mohamed Reda Bouadjenek, and Scott SannerIn Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, Mar 2019Variational Autoencoder (VAE) based methods for Collaborative Filtering (CF) demonstrate remarkable performance for one-class (implicit negative) recommendation tasks by extending autoencoders with relaxed but tractable latent distributions. Explicitly modeling a latent distribution over user preferences allows VAEs to learn user and item representations that not only reproduce observed interactions, but also generalize them by leveraging learning from similar users and items. Unfortunately, VAE-CF can exhibit suboptimal learning properties; e.g., VAE-CFs will increase their prediction confidence as they receive more preferences per user, even when those preferences may vary widely and create ambiguity in the user representation. To address this issue, we propose a novel Queryable Variational Autoencoder (Q-VAE) variant of the VAE that explicitly models arbitrary conditional relationships between observations. The proposed model appropriately increases uncertainty (rather than reduces it) in cases where a large number of user preferences may lead to an ambiguous user representation. Our experiments on two benchmark datasets show that the Q-VAE generally performs comparably or outperforms VAE-based recommenders as well as other state-of-the-art approaches and is generally competitive across the user preference density spectrum, where other methods peak for certain preference density levels.
2018
- DBKDA
A Distributed Collaborative Filtering Algorithm Using Multiple Data SourcesMohamed Reda Bouadjenek, Esther Pacitti, Maximilien Servajean, Florent Masseglia, and Amr El AbbadiIn The Tenth International Conference on Advances in Databases, Knowledge, and Data Applications, Nice, France, Mar 2018Best Paper Award
Collaborative Filtering (CF) is one of the most commonly used recommendation methods. CF consists in predicting whether, or how much, a user will like (or dislike) an item by leveraging the knowledge of the user’s preferences as well as that of other users. In practice, users interact and express their opinion on only a small subset of items, which makes the corresponding user-item rating matrix very sparse. Such data sparsity yields two main problems for recommender systems: (1) the lack of data to effectively model users’ preferences, and (2) the lack of data to effectively model item characteristics. However, there are often many other data sources that are available to a recommender system provider, which can describe user interests and item characteristics (e.g., users’ social network, tags associated to items, etc.). These valuable data sources may supply useful information to enhance a recommendation system in modeling users’ preferences and item characteristics more accurately and thus, hopefully, to make recommenders more precise. For various reasons, these data sources may be managed by clusters of different data centers, thus requiring the development of distributed solutions. In this paper, we propose a new distributed collaborative filtering algorithm, which exploits and combines multiple and diverse data sources to improve recommendation quality. Our experimental evaluation using real datasets shows the effectiveness of our algorithm compared to state-of-the-art recommendation algorithms.
2017
-
Multi-field query expansion is effective for biomedical dataset retrievalMohamed Reda Bouadjenek and Karin VerspoorDatabase, Sep 2017In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one.
- JBI
Automated detection of records in biological sequence databases that are inconsistent with the literatureMohamed Reda Bouadjenek, Karin Verspoor, and Justin ZobelJournal of Biomedical Informatics, Sep 2017We investigate and analyse the data quality of nucleotide sequence databases with the objective of automatic detection of data anomalies and suspicious records. Specifically, we demonstrate that the published literature associated with each data record can be used to automatically evaluate its quality, by cross-checking the consistency of the key content of the database record with the referenced publications. Focusing on GenBank, we describe a set of quality indicators based on the relevance paradigm of information retrieval (IR). Then, we use these quality indicators to train an anomaly detection algorithm to classify records as “confident” or “suspicious”. Our experiments on the PubMed Central collection show assessing the coherence between the literature and database records, through our algorithms, is an effective mechanism for assisting curators to perform data cleansing. Although fewer than 0.25% of the records in our data set are known to be faulty, we would expect that there are many more in GenBank that have not yet been identified. By automated comparison with literature they can be identified with a precision of up to 10% and a recall of up to 30%, while strongly outperforming several baselines. While these results leave substantial room for improvement, they reflect both the very imbalanced nature of the data, and the limited explicitly labelled data that is available. Overall, the obtained results show promise for the development of a new kind of approach to detecting low-quality and suspicious sequence records based on literature analysis and consistency. From a practical point of view, this will greatly help curators in identifying inconsistent records in large-scale sequence databases by highlighting records that are likely to be inconsistent with the literature.
- CIKM
Learning Biological Sequence Types Using the LiteratureMohamed Reda Bouadjenek, Karin Verspoor, and Justin ZobelIn Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, Singapore, Sep 2017We explore in this paper automatic biological sequence type classification for records in biological sequence databases. The sequence type attribute provides important information about the nature of a sequence represented in a record, and is often used in search to filter out irrelevant sequences. However, the sequence type attribute is generally a non-mandatory free-text field, and thus it is subject to many errors including typos, mis-assignment, and non-assignment. In GenBank, this problem concerns roughly 18% of records, an alarming number that should worry the biocuration community. To address this problem of automatic sequence type classification, we propose the use of literature associated to sequence records as an external source of knowledge that can be leveraged for the classification task. We define a set of literature-based features and train a machine learning algorithm to classify a record into one of six primary sequence types. The main intuition behind using the literature for this task is that sequences appear to be discussed differently in scientific articles, depending on their type. The experiments we have conducted on the PubMed Central collection show that the literature is indeed an effective way to address this problem of sequence type classification. Our classification method reached an accuracy of 92.7%, and substantially outperformed two baseline approaches used for comparison.
-
Literature consistency of bioinformatics sequence databases is effective for assessing record qualityMohamed Reda Bouadjenek, Karin Verspoor, and Justin ZobelDatabase, Mar 2017Bioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centres; their diversity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records. Specifically, we emphasize the detection of inconsistent records with respect to the literature. Focusing on GenBank, we propose a set of 24 quality indicators, which are based on treating a record as a query into the published literature, and then use query quality predictors. We then carry out an analysis that shows that the proposed quality indicators and the quality of the records have a mutual relationship, in which one depends on the other. We propose to represent record-literature consistency as a vector of these quality indicators. By reducing the dimensionality of this representation for visualization purposes using principal component analysis, we show that records which have been reported as inconsistent with the literature fall roughly in the same area, and therefore share similar characteristics. By manually analyzing records not previously known to be erroneous that fall in the same area than records know to be inconsistent, we show that one record out of four is inconsistent with respect to the literature. This high density of inconsistent record opens the way towards the development of automatic methods for the detection of faulty records. We conclude that literature inconsistency is a meaningful strategy for identifying suspicious records.Database URL: https://github.com/rbouadjenek/DQBioinformatics
2016
- InfoSci
Social networks and information retrieval, how are they converging? A survey, a taxonomy and an analysis of social information retrieval approaches and platformsMohamed Reda Bouadjenek, Hakim Hacid, and Mokrane BouzeghoubInformation Systems, Mar 2016There is currently a number of research work performed in the area of bridging the gap between Information Retrieval (IR) and Online Social Networks (OSN). This is mainly done by enhancing the IR process with information coming from social networks, a process called Social Information Retrieval (SIR). The main question one might ask is What would be the benefits of using social information (no matter whether it is content or structure) into the information retrieval process and how is this currently done? With the growing number of efforts towards the combination of IR and social networks, it is necessary to build a clearer picture of the domain and synthesize the efforts in a structured and meaningful way. This paper reviews different efforts in this domain. It intends to provide a clear understanding of the issues as well as a clear structure of the contributions. More precisely, we propose (i) to review some of the most important contributions in this domain to understand the principles of SIR, (ii) a taxonomy to categorize these contributions, and finally, (iii) an analysis of some of these contributions and tools with respect to several criteria, which we believe are crucial to design an effective SIR approach. This paper is expected to serve researchers and practitioners as a reference to help them structuring the domain, position themselves and, ultimately, help them to propose new contributions or improve existing ones.
- InfoSys
PerSaDoR: Personalized social document representation for improving web searchMohamed Reda Bouadjenek, Hakim Hacid, Mokrane Bouzeghoub, and Athena VakaliInformation Sciences, Mar 2016In this paper, we discuss a contribution towards the integration of social information in the index structure of an IR system. Since each user has his/her own understanding and point of view of a given document, we propose an approach in which the index model provides a Personalized Social Document Representation (PerSaDoR) of each document per user based on his/her activities in a social tagging system. The proposed approach relies on matrix factorization to compute the PerSaDoR of documents that match a query, at query time. The complexity analysis shows that our approach scales linearly with the number of documents that match the query, and thus, it can scale to very large datasets. PerSaDoR has been also intensively evaluated by an offline study and by a user survey operated on a large public dataset from delicious showing significant benefits for personalized search compared to state of the art methods.
2015
- ICAIL
A study of query reformulation for patent prior art search with partial patent applicationsMohamed Reda Bouadjenek, Scott Sanner, and Gabriela FerraroIn Proceedings of the 15th International Conference on Artificial Intelligence and Law, San Diego, California, Mar 2015Patents are used by legal entities to legally protect their inventions and represent a multi-billion dollar industry of licensing and litigation. In 2014, 326,033 patent applications were approved in the US alone – a number that has doubled in the past 15 years and which makes prior art search a daunting, but necessary task in the patent application process. In this work, we seek to investigate the efficacy of prior art search strategies from the perspective of the inventor who wishes to assess the patentability of their ideas prior to writing a full application. While much of the literature inspired by the evaluation framework of the CLEF-IP competition has aimed to assist patent examiners in assessing prior art for complete patent applications, less of this work has focused on patent search with queries representing partial applications. In the (partial) patent search setting, a query is often much longer than in other standard IR tasks, e.g., the description section may contain hundreds or even thousands of words. While the length of such queries may suggest query reduction strategies to remove irrelevant terms, intentional obfuscation and general language used in patents suggests that it may help to expand queries with additionally relevant terms. To assess the trade-offs among all of these pre-application prior art search strategies, we comparatively evaluate a variety of partial application search and query reformulation methods. Among numerous findings, querying with a full description, perhaps in conjunction with generic (non-patent specific) query reduction methods, is recommended for best performance. However, we also find that querying with an abstract represents the best trade-off in terms of writing effort vs. retrieval efficacy (i.e., querying with the description sections only lead to marginal improvements) and that for such relatively short queries, generic query expansion methods help.
- SIGIR
On Term Selection Techniques for Patent Prior Art SearchMona Golestan Far, Scott Sanner, Mohamed Reda Bouadjenek, Gabriela Ferraro, and David HawkingIn Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, Mar 2015In this paper, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection, using the Description section of the patent query with Language Model (LM) and BM25 scoring functions. We find that an oracular relevance feedback system that extracts terms from the judged relevant documents far outperforms the baseline and performs twice as well on MAP as the best competitor in CLEF-IP 2010. We find a very clear term selection value threshold for use when choosing terms. We also noticed that most of the useful feedback terms are actually present in the original query and hypothesized that the baseline system could be substantially improved by removing negative query terms. We tried four simple automated approaches to identify negative terms for query reduction but we were unable to notably improve on the baseline performance with any of them. However, we show that a simple, minimal interactive relevance feedback approach where terms are selected from only the first retrieved relevant document outperforms the best result from CLEF-IP 2010 suggesting the promise of interactive methods for term selection in patent prior art search.
2013
- ICWE
Evaluation of Personalized Social Ranking Functions of Information RetrievalMohamed Reda Bouadjenek, Amyn Bennamane, Hakim Hacid, and Mokrane BouzeghoubIn Web Engineering, Mar 2013There is currently a number of interesting research works performed in the area of bridging the gap between Social Networks and Information Retrieval (IR). This is mainly done by enhancing the IR process with social information. Hence, many approaches have been proposed to improve the ranking process by personalizing it using social features. In this paper, we review some of these ranking functions.
- SIGIR
Sopra: a new social personalized ranking function for improving web searchMohamed Reda Bouadjenek, Hakim Hacid, and Mokrane BouzeghoubIn Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, Mar 2013We present in this paper a contribution to IR modeling by proposing a new ranking function called SoPRa that considers the social dimension of the Web. This social dimension is any social information that surrounds documents along with the social context of users. Currently, our approach relies on folksonomies for extracting these social contexts, but it can be extended to use any social meta-data, e.g. comments, ratings, tweets, etc. The evaluation performed on our approach shows its benefits for personalized search.
- SIGIR
Using social annotations to enhance document representation for personalized searchMohamed Reda BOUADJENEK, Hakim Hacid, Mokrane Bouzeghoub, and Athena VakaliIn Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, Mar 2013In this paper, we present a contribution to IR modeling. We propose an approach that computes on the fly, a Personalized Social Document Representation (PSDR) of each document per user based on his social activities. The PSDRs are used to rank documents with respect to a query. This approach has been intensively evaluated on a large public dataset, showing significant benefits for personalized search.
- KDD
LAICOS: an open source platform for personalized social web searchMohamed Reda Bouadjenek, Hakim Hacid, and Mokrane BouzeghoubIn Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, Mar 2013In this paper, we introduce LAICOS, a social Web search engine as a contribution to the growing area of Social Information Retrieval (SIR). Social information and personalization are at the heart of LAICOS. On the one hand, the social context of documents is added as a layer to their textual content traditionally used for indexing to provide Personalized Social Document Representations. On the other hand, the social context of users is used for the query expansion process using the Personalized Social Query Expansion framework (PSQE) proposed in our earlier works. We describe the different components of the system while relying on social bookmarking systems as a source of social information for personalizing and enhancing the IR process. We show how the internal structure of indexes as well as the query expansion process operated using social information.
2011
- SIGIR
Personalized social query expansion using social bookmarking systemsMohamed Reda Bouadjenek, Hakim Hacid, Mokrane Bouzeghoub, and Johann DaigremontIn Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, Mar 2011We propose a new approach for social and personalized query expansion using social structures in the Web 2.0. While focusing on social tagging systems, the proposed approach considers (i) the semantic similarity between tags composing a query, (ii) a social proximity between the query and the user profile, and (iii) on the fly, a strategy for expanding user queries. The proposed approach has been evaluated using a large dataset crawled from del.icio.us.
2010
- GIS
GQBox: geospatial data quality assessmentYassine Lassoued, Mohamed Reda Bouadjenek, Omar Boucelma, Fernando Lemos, and Mokrane BouzeghoubIn Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, California, Mar 2010In order to measure and assess the quality of GIS, there exist a sparse offer of tools, providing specific functions with their own interest but are not sufficient to deal with broader user’s requirements. Interoperability of these tools remains a technical challenge because of the heterogeneity of their models and access patterns. On the other side, quality analysts require more and more integration facilities that allow them to consolidate and aggregate multiple quality measures acquired from different observations or data sources, in using/combining seamlessly different quality tools. Clearly, there is a gap between users’s requirements and the spatial data quality market. This demo paper will illustrate GQBox, a geographic quality (tool)box. GQBox supplies a standards-based generic meta model that supports the definition of quality goals and metrics, and it provides a service-based infrastructure that allows interoperability among several quality tools.