Kareem Ahmed

Publications

Semantic Probabilistic Control of Language Models
Kareem Ahmed*, Catarina G Belem*, Padhraic Smyth, Sameer Singh
In Submission.
[abstract]

Semantic control entails steering LM generations towards satisfying subtle non-lexical constraints, e.g., toxicity, sentiment, or politeness, attributes that can be captured by a sequence-level verifier. It can thus be viewed as sampling from the LM distribution conditioned on the target attribute, a computationally intractable problem due to the non-decomposable nature of the verifier. Existing approaches to LM control either only deal with syntactic constraints which cannot capture the aforementioned attributes, or rely on sampling to explore the conditional LM distribution, an ineffective estimator for low-probability events. In this work, we leverage a verifier's gradient information to efficiently reason over all generations that satisfy the target attribute, enabling precise steering of LM generations by reweighing the next-token distribution. Starting from an initial sample, we create a local LM distribution favoring semantically similar sentences. This approximation enables the tractable computation of an expected sentence embedding. We use this expected embedding, informed by the verifier's evaluation at the initial sample, to estimate the probability of satisfying the constraint, which directly informs the update to the next-token distribution. We evaluated the effectiveness of our approach in controlling the toxicity, sentiment, and topic-adherence of LMs yielding generations satisfying the constraint with high probability (>95%) without degrading their quality.
Controllable Generation via Locally Constrained Resampling
Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck
ICLR 2025; Safe Generative AI Workshop @ NeurIPS 2024. Oral Presentation.
[abstract]

Autoregressive models have demonstrated an unprecedented ability at modeling the intricacies of natural language. However, they continue to struggle with generating complex outputs that adhere to logical constraints. Sampling from a fully-independent distribution subject to a constraint is hard. Sampling from an autoregressive distribution subject to a constraint is doubly hard: We have to contend not only with the hardness of the constraint but also the distribution's lack of structure. We propose a tractable probabilistic approach that performs Bayesian conditioning to draw samples subject to a constraint. Our approach considers the entire sequence, leading to a more globally optimal constrained generation than current greedy methods. Starting from a model sample, we induce a local, factorized distribution which we can tractably condition on the constraint. To generate samples that satisfy the constraint, we sample from the conditional distribution, correct for biases in the samples and resample. The resulting samples closely approximate the target distribution and are guaranteed to satisfy the constraints. We evaluate our approach on several tasks, including LLM detoxification and solving Sudoku puzzles. We show that by disallowing a list of toxic expressions our approach is able to steer the model's outputs away from toxic generations, outperforming similar approaches to detoxification. We conclude by showing that our approach achieves a perfect accuracy on Sudoku compared to <50% for GPT4-o and Gemini 1.5.
Where is the signal in tokenization space?
Renato Lui Geh, Honghua Zhang, Kareem Ahmed, Benjie Wang, Guy Van den Broeck
EMNLP 2024. Oral Presentation.
[abstract]

Large Language Models (LLMs) are typically shipped with tokenizers that deterministically encode text into so-called canonical token sequences, to which the LLMs assign probability values. One common assumption is that the probability of a piece of text is the probability of its canonical token sequence. However, the tokenization of a string is not unique: e.g., the Llama2 tokenizer encodes Tokens as [Tok,ens], but [Tok,en,s] also represents the same text. In this paper, we study non-canonical tokenizations. We prove that, given a string, it is computationally hard to find the most likely tokenization for an autoregressive LLM, as well as to compute the marginal probability over all possible tokenizations. We then show how the marginal is, in most cases, indistinguishable from the canonical probability. Surprisingly, we then empirically demonstrate the existence of a significant amount of signal hidden within tokenization space. Notably, by simply aggregating the probabilities of non-canonical tokenizations, we achieve improvements across a range of LLM evaluation benchmarks for a variety of architectures, including transformers and state space models.
Scaling Tractable Probabilistic Circuits: A Systems Perspective
Anji Liu, Kareem Ahmed, Guy Van den Broeck
ICML 2024.
[abstract]

Probabilistic Circuits (PCs) are a general framework for tractable deep generative models, which support exact and efficient probabilistic inference on their learned distributions. Recent modeling and training advancements have enabled their application to complex real-world tasks. However, the time and memory inefficiency of existing PC implementations hinders further scaling up. This paper proposes PyJuice, a general GPU implementation design for PCs that improves prior art in several regards. Specifically, PyJuice is 1-2 orders of magnitude faster than existing systems (including very recent ones) at training large-scale PCs. Moreover, PyJuice consumes 2-5x less GPU memory, which enables us to train larger models. At the core of our system is a compilation process that converts a PC into a compact representation amenable to efficient block-based parallelization, which significantly reduces IO and makes it possible to leverage Tensor Cores available in modern GPUs. Empirically, PyJuice can be used to improve state-of-the-art PCs trained on image (e.g., ImageNet32) and language (e.g., WikiText, CommonGen) datasets. We further establish a new set of baselines on natural image and language datasets by benchmarking existing PC structures but with much larger sizes and more training epochs, with the hope of incentivizing future research.
Probabilistically Rewired Message-Passing Neural Networks
Chendi Qian*, Andrei Manolache*, Kareem Ahmed, Zhe Zeng, Guy Van den Broeck, Mathias Niepert*, Christopher Morris*
ICLR 2024.
[abstract]

Message-passing graph neural networks (MPNNs) emerged as powerful tools for processing graph-structured input. However, they operate on a fixed input graph structure, ignoring potential noise and missing information. Furthermore, their local aggregation mechanism can lead to problems such as over-squashing and limited expressive power in capturing relevant graph structures. Existing solutions to these challenges have primarily relied on heuristic methods, often disregarding the underlying data distribution. Hence, devising principled approaches for learning to infer graph structures relevant to the given prediction task remains an open challenge. In this work, leveraging recent progress in exact and differentiable ksubset sampling, we devise probabilistically rewired MPNNs (PR-MPNNs), which learn to add relevant edges while omitting less beneficial ones. For the first time, our theoretical analysis explores how PR-MPNNs enhance expressive power, and we identify precise conditions under which they outperform purely randomized approaches. Empirically, we demonstrate that our approach effectively mitigates issues like over-squashing and under-reaching. In addition, on established realworld datasets, our method exhibits competitive or superior predictive performance compared to traditional MPNN models and recent graph transformer architectures.
A Pseudo-Semantic Loss for Deep Generative Models with Logical Constraints
Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck
NeurIPS 2023.
[abstract]

Neuro-symbolic approaches bridge the gap between purely symbolic and neural approaches to learning. This often requires maximizing the probability of a symbolic constraint in the neural network's output. However, output distributions are typically assumed to be fully-factorized, which prohibits the application of neurosymbolic learning to more expressive output distributions, such as autoregressive deep generative models. There, such probability computation is #P-hard, even for simple constraints. Instead, we propose to locally approximate the probability of the symbolic constraint under the pseudolikelihood distribution—the product of its full conditionals given a sample from the model. This allows our pseudo-semantic loss function to enforce the symbolic constraint. Our method bears relationship to several classical approximation schemes, including hogwild Gibbs sampling, consistent pseudolikelihood learning, and contrastive divergence. We test our proposed approach on three distinct settings: Sudoku, shortest-path prediction, and detoxifying large language models. Experiments show that pseudo-semantic loss greatly improves upon the base model's ability to satisfy the desired logical constraint in its output distribution.
A Unified Approach to Count-Based Weakly-Supervised Learning
Vinay Shukla, Zhe Zeng*, Kareem Ahmed*, Guy Van den Broeck
NeurIPS 2023.
[abstract]

We observe that in weakly supervised learning for binary classification, the weak supervision can be seen as a constraint on label counts. Inspired by this observation, we aim to develop a unified approach to count-based weakly supervised learning. From first principles, we derive objectives for three of the most common weakly supervised learning paradigms for binary classification and show that these objectives share the same computational building block, which is the probability of an arithmetic constraint defined over the count of positive instances, i.e., the summation of binary labels. We further derive an algorithm to compute the count probabilities in an exact and tractable way, which allows our proposed objectives to be integrated into the end-to-end training of neural network models. Thorough empirical evaluation is performed over all three weakly supervised learning paradigms and shows that our proposed objectives obtained by exact and tractable computations are able to achieve state-of-the-art or highly comparable results on various tasks. We also provide some experimental analysis to illustrate that the models learn effectively by our proposed objectives.
SIMPLE: A Gradient Estimator for k-Subset Sampling
Kareem Ahmed*, Zhe Zeng*, Mathias Niepert, Guy Van den Broeck
ICLR 2023.
[abstract]

k-subset sampling is ubiquitous in machine learning, enabling regularization and interpretability through sparsity. The challenge lies in rendering k-subset sampling amenable to end-to-end learning. This has typically involved relaxing the reparameterized samples to allow for backpropagation, with the risk of introducing high bias and high variance. In this work, we fall back to discrete k-subset sampling on the forward pass. This is coupled with using the gradient with respect to the exact marginals, computed efficiently, as a proxy for the true gradient. We show that our gradient estimator, SIMPLE, exhibits lower bias and variance compared to state-of-the-art estimators, including the straight-through Gumbel estimator when k=1. Empirical results show improved performance on learning to explain and sparse linear regression. We provide an algorithm for computing the exact ELBO for the k-subset distribution, obtaining significantly lower loss compared to SOTA.
Semantic Strengthening of Neuro-Symbolic Learning
Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck
AISTATS 2023.
[abstract]

Numerous neuro-symbolic approaches have recently been proposed typically with the goal of adding symbolic knowledge to the output layer of a neural network. Ideally, such losses maximize the probability that the neural network's predictions satisfy the underlying domain. Unfortunately, this type of probabilistic inference is often computationally infeasible. Neuro-symbolic approaches therefore commonly resort to fuzzy approximations of this probabilistic objective, sacrificing sound probabilistic semantics, or to sampling which is very seldom feasible. We approach the problem by first assuming the constraint decomposes conditioned on the features learned by the network. We iteratively strengthen our approximation, restoring the dependence between the constraints most responsible for degrading the quality of the approximation. This corresponds to computing the mutual information between pairs of constraints conditioned on the network's learned features, and may be construed as a measure of how well aligned the gradients of two distributions are. We show how to compute this efficiently for tractable circuits. We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles, observing that it improves upon the baselines while sidestepping intractability.
Semantic Probabilistic Layers for Neuro-Symbolic Learning
Kareem Ahmed, Stefano Teso, Kai-Wei Chang, Guy Van den Broeck, Antonio Vergari
NeurIPS 2022.
[abstract]

We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our Semantic Probabilistic Layer (SPL) can model intricate correlations, and hard constraints, over a structured output space all while being amenable to end-to-end learning via maximum likelihood. SPLs combine exact probabilistic inference with logical reasoning in a clean and modular way, learning complex distributions and restricting their support to solutions of the constraint. As such, they can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic approaches. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on challenging SOP tasks including hierarchical multi-label classification, pathfinding and preference learning, while retaining perfect constraint satisfaction.
Neuro-Symbolic Entropy Regularization
Kareem Ahmed, Eric Wang, Kai-Wei Chang, Guy Van den Broeck
UAI 2022. Oral Presentation.
[abstract]

In structured prediction, the goal is to jointly predict many output variables that together encode a structured object— a path in a graph, an entity-relation triple, or an ordering of objects. Such a large output space makes learning hard and requires vast amounts of labeled data. Different approaches leverage alternate sources of supervision. One approach—entropy regularization—posits that decision boundaries should lie in low-probability regions. It extracts supervision from unlabeled examples, but remains agnostic to the structure of the output space. Conversely, neuro-symbolic approaches exploit the knowledge that not every prediction corresponds to a valid structure in the output space. Yet, they does not further restrict the learned output distribution. This paper introduces a framework that unifies both approaches. We propose a loss, neuro-symbolic entropy regularization, that encourages the model to confidently predict a valid object. It is obtained by restricting entropy regularization to the distribution over only valid structures. This loss is efficiently computed when the output constraint is expressed as a tractable logic circuit. Moreover, it seamlessly integrates with other neuro-symbolic losses that eliminate invalid predictions. We demonstrate the efficacy of our approach on a series of semi-supervised and fully-supervised structured-prediction experiments, where we find that it leads to models whose predictions are more accurate and more likely to be valid.
PYLON: A PyTorch Framework for Learning with Constraints
Kareem Ahmed, Tao Li, Thy Ton, Quan Guo, Kai-Wei Chang, Parisa Kordjamshidi, Vivek Srikumar, Guy Van den Broeck, Sameer Singh
AAAI 2021, NeurIPS 2021
[abstract]

Deep learning excels at learning low-level task information from large amounts of data, but struggles with learning high-level domain knowledge, which can often be directly and succinctly expressed. In this work, we introduce Pylon, a neuro-symbolic training framework that builds on PyTorch to augment procedurally trained neural networks with declaratively specified knowledge. Pylon allows users to programmatically specify constraints as PyTorch functions, and compiles them into a differentiable loss, thus training predictive models that fit the data whilst satisfying the specified constraints. Pylon includes both exact as well as approximate compilers to efficiently compute the loss, employing fuzzy logic, sampling methods, and circuits, ensuring scalability even to complex models and constraints. A guiding principle in designing Pylon has been the ease with which any existing deep learning codebase can be extended to learn from constraints using only a few lines: a function expressing the constraint and a single line of code to compile it into a loss. We include case studies from natural language processing, computer vision, logical games, and knowledge graphs, that can be interactively trained, and highlights Pylon's usage.
Leveraging Unlabeled Data for Entity-Relation Extraction through Probabilistic Constraint Satisfaction
Kareem Ahmed, Eric Wang, Kai-Wei Chang, Guy Van den Broeck
arxiv pre-print 2021.
[abstract]

We study the problem of entity-relation extraction in the presence of symbolic domain knowledge. Such knowledge takes the form of an ontology defining relations and their permissible arguments. Previous approaches set out to integrate such knowledge in their learning approaches either through self-training, or through approximations that lose the precise meaning of the logical expressions. By contrast, our approach employs semantic loss which captures the precise meaning of a logical sentence through maintaining a probability distribution over all possible states, and guiding the model to solutions which minimize any constraint violations. With a focus on low-data regimes, we show that semantic loss outperforms the baselines by a wide margin.

Book Chapters

Semantic Loss Functions for Neuro-Symbolic Structured Prediction.
Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino,
Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, and Guy Van den Broeck
In Compendium of Neurosymbolic AI, pages 485–505. IOS Press, 2023.

Talks and Presentations

Neuro-Symbolic Artificial Intelligence: A Probabilistic Perspective.
Invited lecture at Probabilistic Programming and Relational Learning, UCLA Spring 2023.
Neuro-Symbolic Entropy Regularization.
Uncertainty in Artificial Intelligence, 2022. Oral full presentation.
Pylon: A Pytorch Framework for Learning with Constraints.
Neural Information Processing, 2022. Oral full presentation.

Service

Program Committee

AKBC 2021, TPM 2022, AAAI 2023, UAI 2023, NeurIPS 2023, CLEAR 2023, TPM 2023, DAE 2023, KLR 2023, ICLR 2024

Community and Diversity:

Co-founded Computer Science PhD @ UCLA Mentorship Program

Teaching

Teaching Assistant:
- Teaching Assistant for Introduction to Computer Science I and II at UCLA, Fall 2018 - Fall 2022
- Junior Teaching Assistant for Introduction to Programming at GUC, Spring 2013
- Junior Teaching Assistant for Media and Networks Lab at GUC, Spring 2014
Guest Lecturer:
- CS267A Probabilistic Programming and Relational Learning at UCLA, Spring 2023

Research Grants

Contributed largely in writing a DARPA grant proposal on Assured Neuro Symbolic Learning and Reasoning largely based
on my research contributions. We were awarded the grant, and I am currently co-leading this cross-site project.

Mentoring

Eric Wang (currently at Citadel)
Vinay Shukla (UCLA undergraduate)
Andrei Molanche (PhD Student at University of Stuttgart)
Poorva Garg (PhD Student at UCLA)
Renato Geh (PhD Student at UCLA)
Alexander Chen (UCLA undergraduate)
Xinran Fang (UCLA undergraduate)
Tang Mohan (UCLA Master’s)