Deep Papers Podcast - Breaking Down EvalGen: Who Validates the Validators? | Free Listening on Podbean App

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

Science:Mathematics

Breaking Down EvalGen: Who Validates the Validators?

2024-05-13

Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet...

Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators often inherit the problems of the LLMs they evaluate, requiring further human validation.

This week’s paper explores EvalGen, a mixed-initative approach to aligning LLM-generated evaluation functions with human preferences. EvalGen assists users in developing both criteria acceptable LLM outputs and developing functions to check these standards, ensuring evaluations reflect the users’ own grading standards.

Read it on the blog: https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/

Paper: https://arxiv.org/abs/2404.12272

To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

More Episodes

Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

Demystifying Chronos: Learning the Language of Time Series

Anthropic Claude 3

Reinforcement Learning in the Era of LLMs

Sora: OpenAI’s Text-to-Video Generation Model

RAG vs Fine-Tuning

Phi-2 Model

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

Explaining Grokking Through Circuit Efficiency

Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

Skeleton of Thought: LLMs Can Do Parallel Decoding

Llama 2: Open Foundation and Fine-Tuned Chat Models

Lost in the Middle: How Language Models Use Long Contexts

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

←
1
2
→

12345678910111213141516171819

Create your
podcast in
minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

You may also like

The Universe Speaks in Numbers

Breaking Math Podcast

Opinionated History of Mathematics

Biostatistics Podcast

SOA Podcasts - Society of Actuaries

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2024 Podbean.com