Podbean logo
  • Discover
  • Podcast Features

    Your all-in-one podcasting solution.

    Podcast Studio

    Easy-to-use audio recorder app.

  • Livestream

    High-performing audio live, without limits.

  • Podcast App

    The best podcast player & podcast app.

  • Ads Marketplace

    Join Ads Marketplace to earn money
    through sponsorship on your podcast.

    PodAds

    Manage your ads with dynamic ad insertion capability.

  • Patron & Paid Content

    The seamless way for fans to support you directly
    from your podcast.

  • Apple Podcasts Subscriptions Integration

    Effortlessly publish and manage exclusive episodes for your
    Apple Podcasts subscribers directly from Podbean.

  • All Arts Business Comedy Education
  • Fiction Government Health & Fitness History Kids & Family
  • Leisure Music News Religion & Spirituality Science
  • Society & Culture Sports Technology True Crime TV & Film
  • Live
  • How to Start a Podcast
  • How to Start a Live Podcast
  • How to Monetize a podcast
  • How to Promote Your Podcast
  • How to Use Group Recording
  • Log in
  • Start your podcast for free
  • Podcasting
    • Podcast Features
    • Live Stream
    • PodAds
    • Podcast App
    • Podcast Studio
  • Monetization
    • Premium
    • Patron
    • Apple Podcasts Subscriptions Integration
    • Ads Marketplace
  • Enterprise
  • Pricing
  • Discover
  • Log in
    Sign up free
Papers Read on AI

Papers Read on AI

News:Tech News

Evaluating Large Language Models: A Comprehensive Survey

Evaluating Large Language Models: A Comprehensive Survey

2023-11-08
Download
Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous downstream applications. Nevertheless, akin to a double-edged sword, LLMs also present potential risks. They could suffer from private data leaks or yield inappropriate, harmful, or misleading content. Additionally, the rapid progress of LLMs raises concerns about the potential emergence of superintelligent systems without adequate safeguards. To effectively capitalize on LLM capacities as well as ensure their safe and beneficial development, it is critical to conduct a rigorous and comprehensive evaluation of LLMs. This survey endeavors to offer a panoramic perspective on the evaluation of LLMs. We categorize the evaluation of LLMs into three major groups: knowledge and capability evaluation, alignment evaluation and safety evaluation. In addition to the comprehensive review on the evaluation methodologies and benchmarks on these three aspects, we collate a compendium of evaluations pertaining to LLMs' performance in specialized domains, and discuss the construction of comprehensive evaluation platforms that cover LLM evaluations on capabilities, alignment, safety, and applicability. We hope that this comprehensive overview will stimulate further research interests in the evaluation of LLMs, with the ultimate goal of making evaluation serve as a cornerstone in guiding the responsible development of LLMs. We envision that this will channel their evolution into a direction that maximizes societal benefit while minimizing potential risks. A curated list of related papers has been publicly available at https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers.

2023: Zishan Guo, Renren Jin, Chuang Liu, Yufei Huang, Dan Shi, Supryadi, Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, Deyi Xiong



https://arxiv.org/pdf/2310.19736v2.pdf
view more

More Episodes

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
2023-11-29 49
Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
2023-11-28 82
Exponentially Faster Language Modelling
2023-11-27 92
Orca 2: Teaching Small Language Models How to Reason
2023-11-24 176
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
2023-11-23 125
A Survey on Language Models for Code
2023-11-22 156
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
2023-11-21 121
Learning to Filter Context for Retrieval-Augmented Generation
2023-11-20 128
GraphCast: Learning skillful medium-range global weather forecasting
2023-11-19 108
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
2023-11-17 128
CogVLM: Visual Expert for Pretrained Language Models
2023-11-16 110
How Can Recommender Systems Benefit from Large Language Models: A Survey
2023-11-15 134
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs
2023-11-14 126
Generative Pretraining in Multimodality
2023-11-10 163
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
2023-11-03 169
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
2023-11-02 134
Zephyr: Direct Distillation of LM Alignment
2023-11-01 146
Large Language Models for Software Engineering: Survey and Open Problems
2023-10-31 200
Eureka: Human-Level Reward Design via Coding Large Language Models
2023-10-30 124
  • ←
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • →
0123456789101112131516171819

Get this podcast on your
phone, FREE

Download Podbean app on App Store Download Podbean app on Google Play

Create your
podcast in
minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get started

It is Free

  • Podcast Services

    • Podcast Features
    • Pricing
    • Enterprise Solution
    • Private Podcast
    • The Podcast App
    • Live Stream
    • Audio Recorder
    • Remote Recording
  •  
    • Create a Podcast
    • Video Podcast
    • Start Podcasting
    • Start Radio Talk Show
    • Education Podcast
    • Church Podcast
    • Nonprofit Podcast
    • Get Sermons Online
    • Free Audiobooks
  • MONETIZATION & MORE

    • Podcast Advertising
    • Dynamic Ads Insertion
    • Patron Program
    • Apple Podcasts Subscriptions
    • Switch to Podbean
    • Submit Your Podcast
    • Podbean Plugins
    • Developers
  • KNOWLEDGE BASE

    • How to Start a Podcast
    • How to Start a Live Podcast
    • How to Monetize a podcast
    • How to Promote Your Podcast
    • How to Use Group Recording
  • Support

    • Support Center
    • What’s New
    • Free Webinars
    • Podcast Events
    • Podbean Academy
    • Podcasting Smarter
    • Badges
    • Resources
  • Podbean

    • About Us
    • Podbean Blog
    • Careers
    • Press and Media
    • Green Initiative
    • Affiliate Program
    • Contact Us
  • Privacy Policy
  • Cookie Policy
  • Terms of Use
  • Consent Preferences
  • Copyright © 2015-2023 Podbean.com