Session Information
Cluster 1
Foundational AI
Co-Chairs
Jinwoo Shin, Seon Joo Kim, Ho Bae
Description
AI technologies have demonstrated remarkable success and potential in a variety of fields based on some common technical principles, such as deep learning. Here, we invite distinguished researchers to share recent advances in these foundational AI technologies, both in software and hardware, as well as real-world applications, including manufacturing and healthcare.
#Foundational AI
#AI for Manufacturing and Healthcare
Program
Day 1 (November 2) | ||
---|---|---|
10:00~11:05 | Chair: Seon Joo Kim | |
Three Visual Intelligence Areas to Watch in Smart Manufacturing | Jiangbo Lu (SmartMore Corp.) |
|
Deep Learning for Videos: Representation and Generation | Jinwoo Shin (KAIST) |
|
15:30~17:00 | Chair: Jinwoo Shin | |
Unlocking Rich Representations in Pretrained Vision-Language Models | Simon Kornblith (Google Deepmind) |
|
Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning | Seon Joo Kim (Yonsei U.) |
|
Faster Segment Anything: Towards Lightweight SAM for Mobile Applications | Choong Seon Hong (Kyung Hee U.) |
Day 2 (November 3) | ||
---|---|---|
10:00~11:05 | Chair: Ho Bae | |
Transforming Healthcare with AI – Challenges and Solutions | Stefan Winkler (ASUS) |
|
Self-Evolving Hardware Intelligence: From Pre-training and Model Compression to GPU/System Optimization | Dongbo Min (Ewha Womans U.) |
Day 1 (November 2) | |
---|---|
10:00~11:05 | |
Chair: Seon Joo Kim | |
Three Visual Intelligence Areas to Watch in Smart Manufacturing | Jiangbo Lu (SmartMore Corp.) |
Deep Learning for Videos: Representation and Generation | Jinwoo Shin (KAIST) |
15:30~17:00 | |
Chair: Jinwoo Shin | |
Unlocking Rich Representations in Pretrained Vision-Language Models | Simon Kornblith (Google Deepmind) |
Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning | Seon Joo Kim (Yonsei U.) |
Faster Segment Anything: Towards Lightweight SAM for Mobile Applications | Choong Seon Hong (Kyung Hee U.) |
Day 2 (November 3) | |
---|---|
10:00~11:05 | |
Chair: Ho Bae | |
Transforming Healthcare with AI – Challenges and Solutions | Stefan Winkler (ASUS) |
Self-Evolving Hardware Intelligence: From Pre-training and Model Compression to GPU/System Optimization | Dongbo Min (Ewha Womans U.) |
Talk Title
Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning
Abstract
Object-centric learning (OCL) aspires general and compositional understanding of scenes by representing a scene as a collection of object-centric representations. OCL has also been extended to multi-view image and video datasets to apply various data-driven inductive biases by utilizing geometric or temporal information in the multi-image data. Single-view images carry less information about how to disentangle a given scene than videos or multi-view images do. Hence, owing to the difficulty of applying inductive biases, OCL for single-view images remains challenging, resulting in inconsistent learning of object-centric representation. To this end, we introduce a novel OCL framework for single-view images, SLot Attention via SHepherding (SLASH), which consists of two simple-yet-effective modules on top of Slot Attention. The new modules, Attention Refining Kernel (ARK) and Intermediate Point Predictor and Encoder (IPPE), respectively, prevent slots from being distracted by the background noise and indicate locations for slots to focus on to facilitate learning of objectcentric representation. We also propose a weak semisupervision approach for OCL, whilst our proposed framework can be used without any assistant annotation during the inference. Experiments show that our proposed method enables consistent learning of object-centric representation and achieves strong performance across four datasets.
Short Bio
Seon Joo Kim received the BS and MS degrees from Yonsei University, Seoul, Republic of Korea, in 1997 and 2001. He received the Ph.D. degree in Computer Science from the University of North Carolina at Chapel Hill in 2008. He is currently an Underwood Distinguished Professor in the Department of Computer Science, Yonsei University. He has served as a Senior Area Chair for CVPR 2023 and an Area Chair for CVPR 2016,2018,2020-2022. He will also serve as an Area Chair for ICCV 2023 and NeurIPS 2023. He is also serving as an Editorial Board member of IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and International Journal of Computer Vision (IJCV). He is a Senior Member of IEEE. His research interests include computer vision, specifically in computational photography, video understanding, and video processing.
Talk Title
Three Visual Intelligence Areas to Watch in Smart Manufacturing
Abstract
Advanced technologies and solutions, reminiscent of their predecessors such as steam engines and mechanical looms, serve as the driving forces propelling the manufacturing industry into a new era. This revolutionary process, widely known as smart manufacturing, entails the swift adoption and optimization of artificial intelligence, data science, synergistic sensor systems, and other cutting-edge elements. These advancements have exhibited tremendous potential for practical applications, already yielding substantial added value. Specifically, from the perspective of computer vision and machine intelligence, I will delve into a comprehensive discussion on three key areas that demand intensive research attention in the pursuit of developing and implementing best-fit AI technologies for smart manufacturing. This presentation will unfold across three main clusters: 1) the "smart brain" (primarily concerned with data), 2) smart vision (focused on visual perception), and 3) smart sensor systems (concerned with optimization).
Short Bio
Dr. Jiangbo Lu is the Co-founder and Chief Technology Officer (CTO) of SmartMore Corporation (https://global.smartmore.com/), a fast-growing Unicorn company founded by the leading experts with over 20 years accumulation in computer vision and machine intelligence. SmartMore focuses on providing all-in-one solutions for smart manufacturing and digital experience industries in the era of global economics, and has a global presence with offices in Hong Kong, Shenzhen, Shanghai, Beijing, Singapore and Tokyo, among others. We have proudly served hundreds of leading enterprises, including several Fortune 500 companies. Currently, Dr. Lu also serves as an Adjunct Professor of South China University of Technology (SCUT). During 2017-2019, he was the CTO & Chief Scientist of a fashion & retail AI tech startup in Shenzhen, where he led R&D to develop award-winning products such as AI virtual fitting mirror, 3D body scanning & measuring systems, as well as a series of leading 3D human body reconstruction and clothing technologies. From 2009-2016, he was a Senior Research Scientist with the Advanced Digital Sciences Center (ADSC), a Singapore-based research center of University of Illinois at Urbana-Champaign (UIUC). As the first technical staff joining ADSC, he has led and worked on several use-inspired research projects that involve basic research, applied research, as well as commercialization of technology. His techniques led to two start-ups and multiple technology licenses. Before joining UIUC-ADSC in Sep 2009, he had experience ever with IMEC (Leuven), Microsoft Research Asia (Beijing), VIA-S3 Graphics (Shanghai). Dr. Lu’s research interests include computer vision, 3D vision, image/video/signal processing, and interactive multimedia. He has published around 100 papers and holds over 100 granted patents. He was an Associate Editor for IEEE Trans. on Circuits and Systems for Video Tech. (TCSVT) in 2012-2016, and received the 2012 TCSVT Best Associate Editor Award. He and his team won a DEMOguru Award at the inaugural DEMO Asia 2012 conference, and the Best Paper Award in the IEEE ICCV 2009 Workshop on Embedded Computer Vision, among several other honors. He received his Ph.D. degree from K. U. Leuven, Belgium, and the B.S. (Honor Mixed Class Program) and the M.S. degrees from Zhejiang University, China. He is a Senior Member of IEEE.
Talk Title
Deep Learning for Videos: Representation and Generation
Abstract
In the last ten years, there has been remarkable progress in deep learning models for solving image-related tasks, including classification, generation and representation. However, that for video-related tasks is arguably slow, despite its practical importance. In this talk, I will present my recent works on deep learning methods for better modeling videos.
Short bio
Jinwoo Shin is currently a KAIST endowed chair professor (jointly affiliated) in Kim Jaechul Graduate School of AI and the School of Electrical Engineering at KAIST. He obtained B.S. degrees (in Math and CS) from Seoul National University in 2001, and the Ph.D. degree (in Math) from Massachusetts Institute of Technology in 2010 with George M. Sprowls Award (for best MIT CS PhD theses). He was a postdoctoral researcher at Algorithms & Randomness Center, Georgia Institute of Technology in 2010-2012 and Business Analytics and Mathematical Sciences Department, IBM T. J. Watson Research in 2012-2013. Dr. Shin's early works are mostly on applied probability and theoretical computer science. After he joined KAIST in Fall 2013, he started to work on the algorithmic foundations of machine learning. He received the Rising Star Award in 2015 from the Association for Computing Machinery (ACM) Special Interest Group for the computer systems performance evaluation community (SIGMETRICS). He also received Kenneth C. Sevcik Award at ACM SIGMETRICS/Performance 2009, Best Publication Award from INFORMS Applied Probability Society 2013, Best Paper Award at ACM MOBIHOC 2013, Bloomberg Scientific Research Award 2015 and ACM SIGMETRICS Test of Time Award 2019.
Talk Title
Transforming Healthcare with AI – Challenges and Solutions
Abstract
AI has demonstrated remarkable potential in a variety of domains. The healthcare sector is also opening up to adopting these transformative technologies, which hold the promise of making operational processes more efficient, reducing healthcare expenditure, identifying more effective treatment approaches, and facilitating the early detection and prevention of diseases. At the same time, AI solutions in this domain need to pay close attention to the intricacies of clinical workflows.
During this presentation, I will discuss some of the challenges in healthcare today, before delving into recent innovations in natural language processing and computer vision that we have developed. Through illustrative examples, I will showcase how these advancements are reshaping hospital operations and care delivery.
Short bio
Stefan Winkler is Research Director of Asus Intelligent Cloud Services (AICS) as well as Adjunct Associate Professor at the National University of Singapore (NUS). Prior to that he was Deputy Director at AI Singapore. He also co-founded two start-ups (Genista and Opsis) and worked for a Silicon Valley company.
Dr. Winkler has a Ph.D. degree from the Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, and a Dipl.-Ing. (M.Eng./B.Eng.) degree from the University of Technology Vienna, Austria. He is an IEEE Fellow and has published over 150 papers. He has also contributed to international standards in VQEG, ITU, ATIS, VSF, and SCTE.
Talk Title
TBA
Talk Title
Self-Evolving Hardware Intelligence: From Pre-training and Model Compression to GPU/System Optimization
Abstract
Successful deployment of AI models requires advancements in various aspects, including AI models, training strategies, and efficient HW design. Beyond the development of new AI models, we would like to expand our research scope into an integrated perspective of SW and HW. In this talk, we introduce our recent achievements in various aspects, ranging from large-scale pretrained model, model compression, and GPU/AI workload optimization. To be more specific, we present how to leverage pre-trained models in the parameter-efficient tuning, dense prediction classifier learned from vision-language model, model compression techniques using quantization and knowledge distillation, data preloading for GPU, and system memory management in deep learning models. Lastly, I will conclude this talks with some remarks on future work.
Short bio
Dongbo Min received Ph.D degree in Yonsei University in 2009. From 2009 to 2010, he was a postdoctoral researcher with Mitsubishi Electric Research Lab (MERL), USA. From 2010 to 2015, he was with the Advanced Digital Sciences Center (ADSC), Singapore. From 2015 to 2018, he was an assistant professor with the Department of Computer Science and Engineering, Chungnam National University, Korea. Since 2018, he has been with the Department of Computer Science and Engineering, Ewha Womans University, Korea.
Talk Title
TBA
Talk Title
Faster Segment Anything: Towards Lightweight SAM for Mobile Applications
Abstract
In this work, we aim to make SAM mobile-friendly by replacing the heavyweight image encoder with a lightweight one. A naive way to train such a new SAM as in the original SAM leads to unsatisfactory performance. Therefore, we distil the knowledge from the heavy image encoder (ViT-H in the original SAM) to a lightweight image encoder, which can be automatically compatible with the mask decoder in the SAM. The training can be completed on a single GPU within less than one day, and the resulting lightweight SAM is termed MobileSAM, which is more than 60 times smaller yet performs on par with the original SAM. For inference speed, With a single GPU, MobileSAM runs around 10ms per image: 8ms on the image encoder and 4ms on the mask decoder. With superior performance, our MobileSAM is around 5 times faster than the FastSAM and 7 times smaller, making it more suitable for mobile applications.
Short bio
Choong Seon Hong received the B.S. and M.S. degrees in electronic engineering from Kyung Hee University, Seoul, South Korea, in 1983 and 1985, respectively, and the Ph.D. degree from Keio University, Tokyo, Japan, in 1997. In 1988, he joined KT, South Korea, where he was involved in broadband networks as a member of the Technical Staff. Since 1993, he has been with Keio University. He was with the Telecommunications Network Laboratory, KT, as a Senior Member of Technical Staff and as the Director of the Networking Research Team until 1999. Since 1999, he has been a Professor at the Department of Computer Science and Engineering at Kyung Hee University. His research interests include machine learning, mobile computing, federated learning and satellite networking.
Talk Title
Unlocking Rich Representations in Pretrained Vision-Language Models
Abstract
Foundation models for vision and language compress a large amount of human knowledge into a comparably small set of weights. The utility of these models depends not only on the representations of human knowledge that are captured in the weights, but also on our ability to make use of those representations. In this talk, I will describe research focused on better harnessing representations of pretrained models. In the first part, I’ll describe a strategy to generate image captions that are more specific than the captions in the training data by altering the sampling strategy. Our approach produces captions that score substantially worse according to metrics that compare generated captions to ground truth, but substantially better according to metrics that measure similarity between captions and images directly in the embedding space of contrastive image/text models. In the second part, I’ll show how, using similarity judgments for less than 2000 images, we can restructure image representations of contrastive image/text models. These restructured representations are better aligned with human representations and yield better performance on few-shot learning and anomaly detection tasks. Taken together, these results demonstrate promising directions for making better use of the rich representations learned by large pretrained models to generate more informative descriptions and improve generalization with limited supervision.
Short bio
Simon Kornblith is a member of the pretraining team at Anthropic. Prior to his current role, he was a Staff Research Scientist at Google DeepMind. He received his Ph.D. in Brain and Cognitive Sciences from MIT in 2017, where he studied the neural mechanisms of working memory and scene processing. He is the co-author of more than 50 publications, including influential papers on transfer learning, self-supervised learning, and neural network representations. He was also one of the initial developers of Zotero, a free and open-source tool for managing research sources and citations.