The MLOPT Idea Seminar is a studentled forum where junior researchers from all around the world are invited to present their recent ideas and findings. The seminar aims to provide an open and supportive platform for researchers to share their published or inprogress papers with the opportunity to receive feedback from their peers. Speakers are also encouraged to bring up open problems or recent research which caught their attention so that it may spark interesting discussions and potentially future collaboration. The series covers a range of topics, typically but not limited to the intersection of machine learning and optimization. We meet on Fridays from 12:30 pm to 1:30 pm CT.
Upcoming Events
This is an accordion element with a series of buttons that open and close related content panels.
Feb 3, 2023: Kwangjun Ahn (MIT)
 Time: Feb 3, 2023, 12:30 PM CT
 Speaker: Kwangjun Ahn (MIT)
 Title: Learning threshold neurons via edge of stability
 Abstract: Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the “edge of stability” or “unstable convergence”) and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely nonconvex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of twolayer neural networks. For these models, we prove the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn “thresholdlike” neurons (i.e., neurons with a nonzero firstlayer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with a useful inductive bias for many tasks.
 Short Bio: Kwangjun Ahn is a Ph.D. student at MIT EECS, advised by Professors Suvrit Sra and Ali Jadbabaie. His research interest lies in Machine Learning, Optimization, and Learning for Control.
Feb 17, 2023: TBA
TBA
Mar 3: Dheeraj Nagaraj (Google)
TBA
Previous Events
This is an accordion element with a series of buttons that open and close related content panels.
Dec 9, 2022: Liu Yang (UWMadison)
 Time: Dec 9, 2022, 12:30 PM CT
 Speaker: Liu Yang (UWMadison)
 Title: A Better Way to Decay: Proximal Gradient Training Algorithms for Neural Nets
 Abstract: Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional to the sum of squared weights. This paper argues that stochastic gradient descent (SGD) may be an inefficient algorithm for this objective. For neural networks with ReLU activations, solutions to the weight decay objective are equivalent to those of a different objective in which the regularization term is instead a sum of products of ℓ2 (not squared) norms of the input and output weights associated with each ReLU. This alternative (and effectively equivalent) regularization suggests a novel proximal gradient algorithm for network training. Theory and experiments support the new training approach, showing that it can converge much faster to the sparse solutions it shares with standard weight decay training.
 Short bio: Liu Yang is a graduate student in UW Madison CS department, coadvised by Prof. Nowak, Prof. Papailiopoulos, and Prof. Lee. Her research interests lie in the intersection of machine learning and deep learning.
Nov 25, 2022: Thanksgiving
No Events.
Nov 11, 2022: Dohyun Kwon (UWMadison)
 Time: Nov 11, 2022, 12:30 PM CT
 Speaker: Dohyun Kwon (UWMadison)
 Title: Convergence of scorebased generative modeling and related problems
 Links: [Video]
 Abstract: Scorebased generative models and diffusion probabilistic models have recently demonstrated remarkable performance in numerous applications. However, it remains unclear whether the distance between the given data distribution and the model distribution is sufficiently small. In this talk, we obtain the upper bound of the Wasserstein distance between two distributions. Under suitable assumptions, this theoretically guarantees that the framework approximates data distributions in the space of probability measures equipped with the Wasserstein distance. Possible variants from the theoretical analysis and related open questions will be discussed. This talk is based on joint work with Ying Fan and Kangwook Lee (UWMadison).
 Short bio: Dohyun Kwon is a Van Vleck Visiting Assistant Professor in the Department of Mathematics at the University of WisconsinMadison. His research interests are partial differential equations and their applications to physics and optimization. Before joining UWMadison, he obtained his Ph. D. in mathematics from UCLA in June 2020 under the advisement of Professor Inwon Kim. He received B.S. in Mathematical Sciences from Seoul National University in South Korea. He is a recipient of the Kwanjeong Educational Foundation Fellowship for doctoral study and the Beckenbach Award from UCLA.
Oct 28, 2022: Jiefeng Chen (UWMadison)
 Time: Oct 28, 2022, 12:30 PM CT
 Speaker: Jiefeng Chen (UWMadison)
 Title: Detecting Errors and Estimating Accuracy on Unlabeled Data with Selftraining Ensembles
 Links: [Video]
 Abstract: When a deep learning model is deployed in the wild, it can encounter test data drawn from distributions different from the training data distribution and suffer a drop in performance. For safe deployment, it is essential to estimate the accuracy of the pretrained model on the test data. However, the labels for the test inputs are usually not immediately available in practice, and obtaining them can be expensive. This observation leads to two challenging tasks: (1) unsupervised accuracy estimation, which aims to estimate the accuracy of a pretrained classifier on a set of unlabeled test inputs; (2) error detection, which aims to identify misclassified test inputs. In this talk, I will introduce a principled and practically effective framework called selftraining ensembles that simultaneously addresses these two tasks.
 Short bio: Jiefeng Chen is a Ph.D. student in the Computer Science Department at the University of WisconsinMadison. He is coadvised by Prof. Yingyu Liang and Prof. Somesh Jha. His research interest is trustworthy machine learning, including adversarial robustness, handling distribution shifts and outofdistribution detection, etc.
Oct. 14, 2022: Yeonwoo Jeong (SNU)
 Time: Oct 14, 2022, 12:30 PM CT
 Speaker: Yeonwoo Jeong (SNU)
 Links:[Video]
 Title: Combinatorial optimization for machine learning
 Abstract: Discrete variables are often more efficient to define realworld problems than continuous variables. However, neural networks and their gradient descent style training are basically designed to represent continuous representations. Therefore, training neural networks with discrete variables is often a difficult problem. Previous works proposed Gumbelsoft max trick, straightthrough estimator, and REINFORCE trick to handle discrete variables. In this seminar, I propose another approach to handling discrete variables via combinatorial optimization when training neural networks. This seminar handles several domains: metric learning, disentanglement, and pruning.
 Short bio: Yeonwoo is a fifthyear Ph.D. student in Computer Science and Engineering at Seoul National University, advised by Hyun Oh Song. He previously graduated from Seoul National University in 2018 with a B.S. in electrical and computer engineering. His research interests are combinatorial optimization and machine learning. Also, He is interested in Blockchain and cryptocurrency.
Sep. 2, 2022: Andrew Lowy (USC)
 Title: Private Federated Learning Without a Trusted Server
 Speaker: Andrew Lowy (USC)
 Time: Sep. 2, 2022, 12:30 pm CT
 Links: [Video]
 Abstract: This work considers federated learning in the absence of a trustworthy server/clients. In this setting, each client needs to ensure the privacy of its own data, even if the server or other clients act adversarially. This requirement motivates the study of clientlevel local differential privacy (CLLDP), an intermediate privacy notion between the “no trust” classical local differential privacy (LDP) and “high trust” central differential privacy (CDP) models. We provide tight (up to logarithms) upper and lower bounds for CLLDP convex/strongly convex federated stochastic optimization with homogeneous (i.i.d.) client data. The CLLDP federated learning rates sit between the rates for classical LDP and CDP, and match the optimal statistical rates in certain practical parameter regimes. Remarkably, we show that similar rates are attainable for smooth losses with arbitrary heterogeneous client data distributions, via a lineartime accelerated CLLDP algorithm. With a secure “shuffler” to anonymize client reports (but without the presence of a trusted server), our algorithm attains the optimal CDP rates for stochastic convex/strongly convex optimization. Time permitting, we will also discuss CLLDP federated learning with nonconvex loss functions.
 Short Bio: Andrew Lowy is a PhD student in Applied Math at University of Southern California, advised by Meisam Razaviyayn. Prior to attending USC, Andrew received his B.A. in public policy and economics at Princeton University, and completed a postbac in mathematics at Columbia University. His research interests lie in the areas of machine learning and optimization, primarily in the intersection of these areas with differential privacy and algorithmic fairness. He especially enjoys understanding fundamental limits and developing scalable algorithms.
Aug. 19, 2022: Peihao Wang (UTAustin)
 Aug. 19, 2022: Peihao Wang (UTAustin)
 Title: Signal Processing for Implicit Neural Representation
 Time: Aug 19, 12:30 pm CT
 Links: [Video][Paper]
 Abstract: Implicit Neural Representations (INR) encoding continuous multimedia data via multilayer perceptrons has shown undebatable promise in various computer vision tasks. Despite many successful applications, editing and processing an INR remains intractable as signals are represented by agnostic parameters of a neural network. Existing works manipulate such continuous representations via processing on their discretized instance, which breaks down the compactness and continuous nature of INR. In this work, we present a pilot study on the question: how to directly modify an INR without explicit decoding? We answer this question by proposing an implicit neural signal processing network, dubbed INSPNet, via differential operators on INR. Our key insight is that spatial gradients of neural networks can be computed analytically and invariant to translation, while mathematically we show that any continuous convolution filter can be uniformly approximated by a linear combination of highorder differential operators. With these two knobs, we instantiate the INR signal operator as a composition of computational graphs corresponding to the highorder derivatives, where the weighting parameters can be either handcrafted or datadriven learned. Based on our proposed INSPNet, we further build the first Convolutional Neural Network (CNN) that implicitly runs on INRs, named INSPConvNet. Our experiments validate the expressiveness of INSPNet and INSPConvNet in fitting lowlevel image processing kernels (e.g. edge detection, blurring, deblurring, denoising, inpainting) as well as for highlevel tasks on implicit fields such as image classification.
 Bio: Peihao Wang is a PhD student in Electrical and Computer Engineering at The University of Texas at Austin advised by Prof. Atlas Wang. Prior to that, he obtained bachelor’s degree from ShanghaiTech University where he worked with Prof. Jingyi Yu and Prof. Manolis Tsakiris. His research interests lie in theories and applications of graph representation learning, 3D vision/geometry/graphics, and computational photography.
Aug. 5, 2022: Sixu Li (UWMadison)
 Speaker: Sixu Li (UWMadison)
 Time: Aug 5th, 12:30 pm CT
 Links: [Video]
 Title: Wasserstein Barycenterbased Model Fusion and Connections to Loss Landscapes of Neural Networks
 Abstract: We propose a unified mathematical framework for neural network (NN) model fusion based on the concepts of Wasserstein barycenter (WB) and GromovWasserstein barycenter (GWB). In our framework, the fusion occurs in a layerwise manner and builds on an interpretation of a node in a network as a function of the layer preceding it. The versatility of our mathematical framework allows us to manage the fusion of different types of NNs, including fully connected NN, CNN, ResNet, RNN, and LSTM, in each case exploiting the specific structure of the network architecture. We present extensive numerical experiments to: 1) illustrate the strengths of our approach in relation to other model fusion methodologies and 2) reveal new insights about the loss landscapes for different types of network architectures and datasets. In particular, from a certain perspective, our findings provide new empirical evidence for recent conjectures which say that local minima found by gradientbased methods end up lying on the same basin of the loss landscape after a proper permutation of weights is applied to one of the models.
 Short Bio: Sixu is a secondyear PhD student in the Stats department working with Professor Nicolas Garcia Trillos. His research interests include optimal transport theory, interaction particle system, and their applications to federated learning.
July 8, 2022: Jiawei Zhang (MIT)
Speaker: Jiawei Zhang (MIT)
 Time: July 8th, 2022, 12:30 pm CT (1:30 pm ET)
 Link: [Slides][Video]
 Title: What is a Good Metric to Study Generalization of Minimax Learners?
 Abstract: Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in the minimax settings, their generalization guarantees in {stochastic minimax optimization problems}, i.e., how the solution trained on empirical data performs on unseen testing data, have been relatively underexplored. A fundamental question remains elusive: What is a good metric to study generalization of minimax learners? In this paper, we aim to answer this question by first showing that primal risk, a universal metric to study generalization in minimization problems, {which has also been adopted recently to study generalization in minimax ones,} fails in simple examples. We thus propose a new metric to study generalization of minimax learners: the primal gap, {defined as the difference between the primal risk and its minimum over all models,} to circumvent the issues. Next, we derive generalization {error} bounds for the primal gap in nonconvexconcave settings. As byproducts of our analysis, we also solve two open questions: establishing generalization {error} bounds for primal risk and primaldual risk{, another existing metric that is only welldefined when the global saddlepoint exists,} in the strong sense, i.e., without strong concavity or assuming that the maximization and expectation can be interchanged, while either of these assumptions was needed in the literature. Finally, we leverage this new metric to compare the generalization behavior of two popular algorithms — gradient descentascent (GDA) and gradient descentmax (GDMax) in stochastic minimax optimization.
 Short Bio: Jiawei Zhang is a postdoc associate in Laboratory for Information & Decision Systems At MIT working with Prof. Asuman Ozdaglar. He attained his Ph.D. degree from the Chinese University of Hong Kong (Shenzhen) supervised by Prof. Tom Luo. Previously, he attained B.Sc in Mathematics (Hua LooKeng talent program) from the University of Science and Technology of China. His research interest includes optimization theory and algorithms for nonconvex problems, with application to machine learning, and signal processing.
July 1, 2022: Sewon Min (UWashington)
Speaker: Sewon Min (UWashington)
 Time: July 1, 2022, 12:30 pm CT
 Link: [Slides]
 Title: Understanding and Improving Learning through Inference with Large Language Models
 Abstract: Language models are capable of “learning at inference” (also referred to as incontext learning) – learning a new task by conditioning on k examples and making a prediction for a new input with no parameter updates. While impressive, models suffer from high variance and low worstcase accuracy. Moreover, we do not understand how or why incontext learning works. First, I will introduce new methods that lead to significant performance gains by reducing variance and improving worstcase accuracy. I will then show that incontext learning in fact works very differently from conventional learning: the model does not benefit from the correctly paired training data, but rather benefit from the correct specification of the independent distribution of inputs and labels. Finally, I will conclude the talk with lessons learned, limitations, and avenues for future work.
 Short Bio: Sewon Min is a Ph.D. student in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a parttime visiting researcher at Meta AI. Her research is in the area of natural language processing and machine learning. She was a coorganizer of multiple workshops and tutorials at ACL, EMNLP, NeurIPS, and AKBC, including a workshop on Machine Reading for Question Answering, a workshop on Representation Learning for NLP, a workshop on Semiparametric Methods in NLP, and a tutorial on Zero and Fewshot Learning with Pretrained Language Models. Prior to UW, she obtained a B.S. degree in Computer Science & Engineering from Seoul National University.
Jun. 24, 2022: Yuxin Sun (UWMadison)
Speaker: Yuxin Sun (UWMadison)
 Link: [Video]
 Time: June 24, 12:30 pm CT (10:30 am PT, 1:30 pm ET)

Title: On Outlier Robust Learning of Ising Models

Abstract: We study the problem of learning hightemperature Ising models in the outlierrobust setting where a constant fraction of the samples is adversarially corrupted. We provide the first computationally efficient robust learning algorithm for this problem with nearoptimal error guarantees $O(\epsilon\log(1/\epsilon))$. Our algorithm can be seen as a special case of an algorithm for robustly learning a distribution from a general exponential family. To prove its correctness for Ising models, we establish new anticoncentration results for degree2 polynomials of Ising models that may be of independent interest.
In addition, we show that no efficient SQ algorithm with access to an $\epsilon$corrupted ferromagnetic hightemperature Ising model can learn the model to total variation distance $o(\epsilon\log(1/\epsilon))$. Our SQ lower bound matches the error guarantees of known algorithms for this problem, providing evidence that the current upper bound for this task is the best possible. At the technical level, we develop a generic SQ lower bound for discrete highdimensional distributions starting from lowdimensional moment matching constructions that we believe will find other applications.

Short Bio: Yuxin Sun is a Ph.D. student in the Department of Computer Sciences at the University of WisconsinMadison, advised by Prof. Ilias Diakonikolas. His research interests lie between theoretical computer science and machine learning with a focus on developing efficient algorithms for fundamental problems in data science with provable guarantee, especially in settings where data might be heterogeneous or contaminated.
Jun. 3, 2022: Dung (June) Thai (UMass Amherst)
Speaker: Dung (June) Thai (UMass Amherst)
 Link: [Video]
 Title: CaseBased Reasoning for Question Answering
 Abstract: Question answering (QA) tasks often necessitate a diverse set of reasoning chains across various types of questions and data availability. Handling new combinations of inferential chains and entities while adapting to new data is challenging. CaseBased Reasoning (CBR) by design fits well for these needs. A recent study (Das et al.) has successfully adopted CBR for supervised knowledgebase QA, significantly improved performances on questions with novel entities. This talk further explores CBR methods for weaksupervised KBQA and opendomain QA.
 Short Bio: Dung (June) Thai is a Ph.D. student at UMass Amherst, advised by Prof. Andrew McCallum. Her research focuses on question answering systems. Prior to the Ph.D. program, she got her BS and MS degree from Vietnam National University.
May. 13, 2022: Xuefeng Du (UWMadison)
May. 13, 2022: Xuefeng Du (UWMadison)
 Links: [Video]
 Title: Unknownaware learning for object detection
 Abstract: Outofdistribution (OOD) detection is indispensable for deploying machine learning models in the wild. One of the key challenges in OOD detection is that models lack supervision signals from unknown data. Previous approaches rely on real outlier datasets for model regularization, which can be costly and sometimes infeasible to obtain in practice. In this talk, we are going to talk about our recent approaches to virtual outlier synthesis and spatialtemporal unknown distillation to generate unknowns in a convenient way. Then, we design sophisticated regularization terms to help our model achieve strong OOD detection performance on a new benchmark for object detection.
 Short bio: Xuefeng is a secondyear CS Ph.D. student at UWMadison. His research focus is trustworthy machine learning, such as adversarial robustness, learning problem with label noise, and outofdistribution detection. He is also interested in neural architecture search, graph mining, and highlevel vision tasks, such as object detection and segmentation.
Apr. 29, 2022: Andrew Wagenmaker (UWashington)
 Apr. 29, 2022: Andrew Wagenmaker (UWashington)
 Links: [Video]
 Title: Towards the True Complexity of Learning NearOptimal Policies in Markov Decision Processes
 Abstract: Recent advances in reinforcement learning (RL) demonstrate its potential for solving realworld problems. However, significant challenges remain, limiting its widespread deployment. In particular, the large sample complexity—the number of interactions with the environment—required by RL algorithms to learn nearoptimal policies has proved a critical difficulty. In this work, we take steps towards addressing this challenge by seeking to understand the true sample complexity of RL, with the goal of developing algorithms that take into account problem structure to learn efficiently. We consider two directions in particular. First, we study the instancedependent complexity of learning nearoptimal policies in tabular Markov Decision Processes (MDPs). We seek to obtain sample complexities that scale not with worstcase measures such as the number of states and actions, but with quantities that depend on the inherent difficulty of each individual MDP. We propose a new instancedependent measure of complexity, the gapvisitation complexity, and an algorithm with sample complexity scaling as the gapvisitation complexity. Furthermore, by analyzing the gapvisitation complexity on several examples, we show that existing approaches to learning nearoptimal policies can have sample complexity that is arbitrarily suboptimal. Next, we study how the structure of the MDP affects the ability to generalize across reward functions. We show that, when the MDP has a particular underlying linear structure, the sample complexity of learning a nearoptimal policy for an arbitrary number of reward functions is, in the worst case, no larger than the sample complexity of learning a nearoptimal policy for a single reward function, implying that rewardfree RL is no harder than rewardaware RL in linear MDPs.
 Short bio: Andrew is a fourthyear Ph.D. student in Computer Science at the University of Washington working with Kevin Jamieson. His research interests are in active learning, reinforcement learning, and control theory. He is supported by an NSF Graduate Research Fellowship
Apr. 15, 2022: Jeongyeol Kwon (UTexas)
Apr. 15, 2022: Jeongyeol Kwon (UTexas)
 Time: 12:30 pm CDT
 Location: Orchard view room, Wisconsin Institutes for Discovery (WID), 330 N Orchard St.
 Title: Reinforcement Learning with Latent Contexts
 Abstract: Learning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. One example where such a challenge arises is a system associated with latent contexts. In this talk, we consider episodic reinforcement learning in a rewardmixing Markov decision process (RMMDP). There, a reward function is drawn from one of multiple possible reward models at the beginning of every episode, but the identity of the chosen reward model (context) is not revealed to the agent. With no assumptions on system dynamics, even for two switchingreward models, the problem requires several new ideas beyond existing algorithmic and analysis techniques for efficient exploration. We develop the first efficient algorithm that does not require any assumptions for two switchingreward models. I will then briefly discuss the scalability and generalizability of the proposed method in more challenging scenarios.
 Short bio: Jeongyeol is a Ph.D. Candidate in Electrical and Computer Engineering at the University of Texas at Austin. He is broadly interested in theoretical aspects of machine learning, with a focus on Statistical Learning Theory, Optimization, Bandits/Reinforcement Learning, and Robust Statistics. His current research aims to develop efficient and robust algorithms for sequential decisionmaking problems arising from real applications.
Apr. 1, 2022: Defeng Liu (PolyMtl)
Apr. 1, 2022: Defeng Liu (PolyMtl)
 Link: [Video]
 Title: Learning to search in local branching for solving mixedinteger programs
 Abstract: Finding highquality solutions to mixedinteger linear programming problems (MILPs) is of great importance for many practical applications. In this respect, the refinement heuristic local branching (LB) has been proposed to produce improving solutions and has been highly influential for the development of local search methods in MILP. For a LB algorithm, the choice of the neighborhood size is critical to performance. Although it was initialized by a conservative value in the original LB scheme, our new observation is that the best size is strongly dependent on the particular MILP instance. In this work, we investigate the relation between the size of the search neighborhood and the behavior of the underlying LB algorithm, and we devise a leaning based framework for guiding the neighborhood search of the LB heuristic. We computationally show that the neighborhood size can indeed be learned, leading to improved performances and that the overall algorithm generalizes well both with respect to the instance size and, remarkably, across instances.
 Short Bio: Defeng Liu is currently doing his Ph.D. in Mathematics at the Polytechnique Montréal, under the supervision of Prof. Andrea Lodi. His research focuses on MixedInteger Programming, Mathematical Optimization, and Machine Learning. Namely, Defeng is interested in applying deep learning and reinforcement learning techniques in mixedinteger programs to solve complex problems, aiming at bringing machine learning and operations research closer together.
Mar. 18, 2022: Yae Jee Cho (CMU)
Mar. 18, 2022: Yae Jee Cho (CMU)
 Links: [Video]
 Title: Leveraging Biased Client Selection in Federated Learning
 Abstract: Federated learning is a distributed optimization paradigm that enables a large number of resourcelimited client nodes to cooperatively train a model without data sharing. Previous works analyzed federated learning by accounting of data heterogeneity, communication/computation limitations, and partial client participation. However, most assume unbiased client participation, where clients are selected such that the aggregated model update is unbiased. In our work, we investigate how biased client selection can be leveraged to improve many aspects of federated learning including convergence speed, fairness, and client incentives.
 Short Bio: Yae Jee Cho is a Ph.D. student in the ECE department at Carnegie Mellon University advised by Prof. Gauri Joshi. Her main research interests are in distributed machine learning, in particular, federated learning with a focus on client systems and data heterogeneity. She received her B.S. and M.S. degrees in Electrical & Computer Engineering from Yonsei University. She has worked with the Privacy in AI team at Microsoft Research as a research intern on federated learning and is a recipient of the Korean Government Doctoral Scholarship.
Mar. 4, 2022: Boxin Zhao (UChicago)
Mar. 4, 2022: Boxin Zhao (UChicago)
 Links: [arXiv][arXiv]
 Title: Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback
 Abstract: In federated learning (FL) problems, client sampling plays a key role in the convergence speed of training algorithm. However, while being an important problem in FL, client sampling is lack of study. In this paper, we propose an online learning with bandit feedback framework to understand the client sampling problem in FL. By adapting an Online Stochastic Mirror Descent algorithm to minimize the variance of gradient estimation, we propose a new adaptive client sampling algorithm. Besides, we use online ensemble method and doubling trick to automatically choose the tuning parameters in the algorithm. Theoretically, we show dynamic regret bound with comparator as the theoretically optimal sampling sequence; we also include the total variation of this sequence in our upper bound, which is a natural measure of the intrinsic difficulty of the problem. To the best of our knowledge, these theoretical contributions are novel to existing literature. Moreover, by implementing both synthetic and real data experiments, we show empirical evidence of the advantages of our proposed algorithms over widelyused uniform sampling and also other online learning based sampling strategies in previous studies. We also examine its robustness to the choice of tuning parameters. Finally, we discuss its possible extension to sampling without replacement and personalized FL objective. While the original goal is to solve client sampling problem, this work has more general applications on stochastic gradient descent and stochastic coordinate descent methods.
 Short bio: Boxin Zhao is a Ph.D. student in Statistics and Econometrics at the University of Chicago, advised by Prof. Mladen Kolar. His research interests include machine learning, federated learning, probabilistic graphical models, highdimensional statistics, and optimization, with a focus on developing novel methodologies with both practical applications and theoretical guarantees.
Feb. 18, 2022: Sungryull Sohn (LG AI Research US)
Feb. 18, 2022: Sungryull Sohn (LG AI Research US)
 Links: [Video] [arXiv]
 Title: Successor Feature Landmarks for LongHorizon GoalConditioned Reinforcement Learning
 Abstract: Operating in the real world often requires agents to learn about a complex environment and apply this understanding to achieve a breadth of goals. This problem, known as goalconditioned reinforcement learning (GCRL), becomes especially challenging for longhorizon goals. Current methods have tackled this problem by augmenting goalconditioned policies with graphbased planning algorithms. However, they struggle to scale to large, highdimensional state spaces and assume access to exploration mechanisms for efficiently collecting training data. In this work, we introduce Successor Feature Landmarks (SFL), a framework for exploring large, highdimensional environments so as to obtain a policy that is proficient for any goal. SFL leverages the ability of successor features (SF) to capture transition dynamics, using it to drive exploration by estimating statenovelty and to enable highlevel planning by abstracting the statespace as a nonparametric landmarkbased graph. We further exploit SF to directly compute a goalconditioned policy for interlandmark traversal, which we use to execute plans to “frontier” landmarks at the edge of the explored state space. We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, highdimensional state spaces and outperforms stateoftheart baselines on longhorizon GCRL tasks.
 Short bio: Sungryull Sohn is a research scientist at LG AI research US. He received B.S. and M.S. degrees in Electrical Engineering from KAIST, and a Ph.D. degree from the University of Michigan. Before his Ph.D., he worked at Electronics and Telecommunications Research Institute (ETRI) as a researcher. During his Ph.D., he worked at Microsoft Research as a research intern and Google Brain as a research intern and student researcher. His research interests include various areas in deep reinforcement learning, such as compositional task learning, hierarchical reinforcement learning, metareinforcement learning, multitask reinforcement learning, offline reinforcement learning, and neural combinatorial optimization.
Feb. 4, 2022: Jiaxin Hu (UWMadison)
Feb. 4, 2022: Jiaxin Hu (UWMadison)
 Links: [Video][arXiv]
 Title: Multiway Spherical Clustering via DegreeCorrected Tensor Block Models
 Abstract: We consider the problem of multiway clustering in the presence of unknown degree heterogeneity. Such data problems arise commonly in applications such as recommendation system, neuroimaging, community detection, and hypergraph partitions in social networks. The allowance of degree heterogeneity provides great flexibility in clustering models, but the extra complexity poses significant challenges in both statistics and computation. Here, we develop a degreecorrected tensor block model with estimation accuracy guarantees. We present the phase transition of clustering performance based on the notion of angle separability, and we characterize three signaltonoise regimes corresponding to different statisticalcomputational behaviors. In particular, we demonstrate that an intrinsic statisticaltocomputational gap emerges only for tensors of order three or greater. Further, we develop an efficient polynomialtime algorithm that provably achieves exact clustering under mild signal conditions. The efficacy of our procedure is demonstrated through two data applications, one on human brain connectome project, and another on Peru Legislation network dataset.
 Short Bio: Jiaxin is a secondyear Ph.D. student in Statistics at the University of WisconsinMadison working with Prof. Miaoyan Wang. Her research interests lie in the intersection of statistics and machine learning. Currently, she is working on higherorder tensor methods with application in networks. Prior to the Ph.D. program, she received a Master’s degree in Statistics from UWMadison in 2020 and Bachelor’s degree in Statistics from Wuhan University in 2019.
Jan. 21, 2022: Jyyong Sohn (UWMadison)
Jan. 21, 2022: Jyyong Sohn (UWMadison)
 Links: [Video][arXiv]
 Title: GenLabel: Mixup Relabeling using Generative Models
 Abstract: Mixup is a data augmentation method that generates new data points by mixing a pair of input data. While mixup generally improves the prediction performance, it sometimes degrades the performance. In this paper, we first identify the main causes of this phenomenon by theoretically and empirically analyzing the mixup algorithm. To resolve this, we propose GenLabel, a simple yet effective relabeling algorithm designed for the mixup. In particular, GenLabel helps the mixup algorithm correctly label mixup samples by learning the classconditional data distribution using generative models. Via extensive theoretical and empirical analysis, we show that mixup, when used together with GenLabel, can effectively resolve the aforementioned phenomenon, improving the generalization performance and the adversarial robustness.
 Short Bio: Jyyong is a postdoctoral researcher in the Department of Electrical and Computer Engineering (ECE) at the University of WisconsinMadison, working with Prof. Dimitris Papailiopoulos and Prof. Kangwook Lee. He is interested in the intersection of machine learning, information theory, and distributed algorithms. He received his Ph.D. degree in 2020 from KAIST, under the supervision of Prof. Jaekyun Moon. He is a recipient of the IEEE ICC Best Paper Award, Qualcomm Innovation Awards, KAIST Global Leader Fellowship, KAIST EE Best Research Achievement Award, and NRF Korea Postdoctoral Fellowship.
Jan. 7, 2022: Hongyi Wang (CMU)
Jan. 7, 2022: Hongyi Wang (CMU)
 Links: [video][arXiv]
 Title: On the Utility of Gradient Compression in Distributed Training Systems
 Abstract: A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous dataparallel training. To alleviate these bottlenecks, a long line of recent research proposes gradient and model compression methods. In this talk, I will first present an evaluation of gradient compression methods’ efficacy and comparisons of their scalability with optimized implementations of synchronous dataparallel SGD across more than 200 realistic distributed setups. The observation is that, surprisingly, only in 6 cases out of 200, gradient compression methods provide promising speedup. I will then introduce our extensive investigation to identify the root causes of this phenomenon and present a performance model that can be used to identify the benefits of gradient compression for a variety of system setups. Finally, I will propose a list of desirable properties that gradient compression methods should satisfy, in order for them to provide meaningful utility in practice.
 Short Bio: Hongyi Wang is currently a postdoctoral research fellow at the Machine Learning Department at CMU working with Eric Xing. He obtained his Ph.D. degree at the CS department at UWMadison, advised by Dimitris Papailiopoulos. His research interests are machine learning algorithms and systems. Hongyi has developed several wellknown frameworks regarding communicationefficient and robust distributed learning, which were appeared on ICML, NeurIPS, ICLR, and MLSys. His recent work on federated learning has received lots of attention as well as the Baidu best paper award from the SpicyFL workshop at NeurIPS 2020.
Dec. 10, 2021: Angeliki Giannou (UWMadison)
 Links: [video] [arxiv]
 Title: Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information
 Abstract: In this work, we examine the Nash equilibrium convergence properties of noregret learning in general Nplayer games. For concreteness, we focus on the archetypal follow the regularized leader (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter – from noisy, oraclebased feedback, to bandit, payoffbased information. In this general context, we establish a comprehensive equivalence between the stability of a Nash equilibrium and its support: a Nash equilibrium is stable and attracting with arbitrarily high probability if and only if it is strict (i.e., each equilibrium strategy has a unique best response).
 Short Bio: Angeliki is a firstyear Ph.D. student in the UWMadison, advised by Prof. Dimitris Papailiopoulos. Prior to that, she received a Bachelor & M.Sc degree in ECE from National Technical University of Athens, working with Prof. Panayotis Mertikopoulos. Her research interests involve machine learning theory and optimization.
Nov. 12, 2021: Chulhee Yun (MIT)
 Links: [video] [arxiv]
 Title: Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond
 Abstract: In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via withreplacement sampling. In contrast, we study shufflingbased variants: minibatch and local Random Reshuffling, which draw stochastic gradients without replacement and are thus closer to practice. For smooth functions satisfying the PolyakŁojasiewicz condition, we obtain convergence bounds (in the large epoch regime) which show that these shufflingbased variants converge faster than their withreplacement counterparts. Moreover, we prove matching lower bounds showing that our convergence analysis is tight. Finally, we propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in nearhomogeneous settings. Joint work with Shashank Rajput (UWMadison) and Suvrit Sra (MIT).
Organizers: Jyyong Sohn and Yuchen Zeng
Get the latest MLOPT Idea Seminar news: join the MLOPT Idea Seminar mailing list on Google Groups (you must have a google account to join).