Datasets

Various (math evaluation) datasets

from dart_math.data import *

WARNING 12-10 04:54:47 _custom_ops.py:14] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')

Preset Datasets

Preset datasets so far:

Dataset	ID	Size	Stored At	Source
MATH/Test	`"math/test"`	5000	🤗 HuggingFace	🤗 hendrycks/competition_math
MATH/Train	`"math/train"`	7500	🤗 HuggingFace	🤗 hendrycks/competition_math
GSM8K/Test	`"gsm8k/test"`	1319	🤗 HuggingFace	🤗 gsm8k
GSM8K(Fixed)/Train (DEPRECATED: GSM8K/Train)	`"gsm8k-fix/train"` (DEPRECATED: `"gsm8k/train"`)	7473	🤗 HuggingFace (DEPRECATED: 🤗 HuggingFace)	🤗 gsm8k
MWPBench/CollegeMath/Test	`"mwpbench/college-math/test"`	2818	🎯 dart/data/dsets	🐱 microsoft/unilm/mathscale/MWPBench
MWPBench/CollegeMath/Train	`"mwpbench/college-math/train"`	1281	🎯 dart/data/dsets	🐱 microsoft/unilm/mathscale/MWPBench
MWPBench/GaokaoBench	`"mwpbench/gaokaobench"`	508	🎯 dart/data/dsets	🐱 microsoft/unilm/mathscale/MWPBench
MWPBench/FreshGaokaoMath2023	`"mwpbench/fresh-gaokao-math-2023"`	30	🎯 dart/data/dsets	🐱 microsoft/unilm/mathscale/MWPBench
DeepMind Mathematics	`"deepmind-mathematics"`	1000	🎯 dart/data/dsets	🐱 google-deepmind/mathematics_dataset
OlympiadBench-Math	`"olympiadbench/OE_TO_maths_en_COMP"`	675	🎯 dart/data/dsets	🐱 OpenBMB/OlympiadBench
TheoremQA	`"theoremqa"`	800	🎯 dart/data/dsets	🐱 TIGER-AI-Lab/TheoremQA
Odyssey-Math	`"odyssey-math"`	386	🎯 dart/data/dsets	🐱 protagolabs/odyssey-math
AOPS	`"aops"`	3886	🎯 dart/data/dsets	🌐 AOPS

For other datasets, please refer to load_query_dps to add by yourself.

source

load_query_dps

 load_query_dps (dataset:str|list[str]='math-test',
                 max_n_trials:int|list[int]=1,
                 min_n_corrects:int|list[int]=0,
                 prompt_template:str='alpaca', n_shots:int=-1)

Load dataset(s) as QueryDataPoints. If needed, please add datasets here following the format of the existing datasets, or specify the dataset .json path with the stem name as dataset ID.

	Type	Default	Details
dataset	str \| list[str]	math-test	(List of) dataset ID or path to dataset of samples with “query” and “ref_ans” fields. Path will not use other two arguments.
max_n_trials	int \| list[int]	1	(List of) maximum number of raw responses to be generated for each dataset. Non-positive value or `None` means no limit.
min_n_corrects	int \| list[int]	0	(List of) minimum number of correct responses to be generated for each dataset. Non-positive value or `None` means no limit.
prompt_template	str	alpaca	ID / Path of the prompt template.
n_shots	int	-1
Returns	list		`QueryDataPoint` to be input to `dart.gen.gen`.

Unified Data Templates

We unify the data format across dart.

source

QueryDataPoint

 QueryDataPoint (dataset:str, query:str, ref_ans:str,
                 prompt_template:dart_math.utils.PromptTemplate='alpaca',
                 n_shots:int=-1, n_trials:int=0, n_corrects:int=0,
                 max_n_trials:int|None=None, min_n_corrects:int|None=None,
                 **kwargs:dict[str,typing.Any])

The query-level data point to generate responses with vllm using sampling_params (and evaluate with evaluator) on.

	Type	Default	Details
dataset	str		The dataset name the the query belongs to. E.g. “math”.
query	str		Raw query, without other prompt.
ref_ans	str		The short reference answer to the `query`.
prompt_template	PromptTemplate	alpaca	The prompt template object to use.
n_shots	int	-1	Number of examples in the few-shot prompt. Negative means adaptive to the datasets.
n_trials	int	0	Number of raw responses already generated for the `query`.
n_corrects	int	0	Number of correct responses already generated for the `query`.
max_n_trials	int \| None	None	Maximum number of trials to generate a response, by default None `None` or Negative means no limit.
min_n_corrects	int \| None	None	Maximum number of trials to generate a response, by default None `None` or Negative means no limit.
kwargs	dict		Other fields to store.

source

RespSampleBase

 RespSampleBase (dataset:str, query:str, ref_ans:str, resp:str, agent:str,
                 prompt_template:str=None, ans:str=None,
                 correct:bool=None)

The response-level data point containing the query-level data point and other response-level information.

	Type	Default	Details
dataset	str		The dataset name the the query belongs to.
query	str		The input query to generate responses on.
ref_ans	str		The reference answer to the query.
resp	str		The generated response.
agent	str
prompt_template	str	None
ans	str	None	The answer in the generated response, by default None
correct	bool	None	Whether the generated response is correct, by default None

source

RespSampleVLLM

 RespSampleVLLM (dataset:str, query:str, ref_ans:str, abs_tol:float=None,
                 resp:str=None, finish_reason:str=None,
                 stop_reason:str=None, cumulative_logprob:float=None,
                 ans:str=None, correct:bool=None, **kwargs)

The response-level data point from vllm model, containg extra fields like finish_reason, stop_reason, cumulative_logprob.

	Type	Default	Details
dataset	str		The dataset name the the query belongs to.
query	str		The input query to generate responses on.
ref_ans	str		The reference answer to the query.
abs_tol	float	None	The absolute tolerance of the answer.
resp	str	None	The generated response.
finish_reason	str	None	The reason for finishing the generation from `vllm`
stop_reason	str	None	The reason for stopping the generation from `vllm`, e.g. EoS token.
cumulative_logprob	float	None	The cumulative log probability of the generated response.
ans	str	None	The generated response.
correct	bool	None	Whether the generated response is correct.
kwargs			Other fields to store.