Datasets

Various (math evaluation) datasets
from dart_math.data import *
WARNING 12-10 04:54:47 _custom_ops.py:14] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')

Preset Datasets

Preset datasets so far:

Dataset ID Size Stored At Source
MATH/Test "math/test" 5000 πŸ€— HuggingFace πŸ€— hendrycks/competition_math
MATH/Train "math/train" 7500 πŸ€— HuggingFace πŸ€— hendrycks/competition_math
GSM8K/Test "gsm8k/test" 1319 πŸ€— HuggingFace πŸ€— gsm8k
GSM8K(Fixed)/Train
(DEPRECATED: GSM8K/Train)
"gsm8k-fix/train"
(DEPRECATED: "gsm8k/train")
7473 πŸ€— HuggingFace
(DEPRECATED: πŸ€— HuggingFace)
πŸ€— gsm8k
MWPBench/CollegeMath/Test "mwpbench/college-math/test" 2818 🎯 dart/data/dsets 🐱 microsoft/unilm/mathscale/MWPBench
MWPBench/CollegeMath/Train "mwpbench/college-math/train" 1281 🎯 dart/data/dsets 🐱 microsoft/unilm/mathscale/MWPBench
MWPBench/GaokaoBench "mwpbench/gaokaobench" 508 🎯 dart/data/dsets 🐱 microsoft/unilm/mathscale/MWPBench
MWPBench/FreshGaokaoMath2023 "mwpbench/fresh-gaokao-math-2023" 30 🎯 dart/data/dsets 🐱 microsoft/unilm/mathscale/MWPBench
DeepMind Mathematics "deepmind-mathematics" 1000 🎯 dart/data/dsets 🐱 google-deepmind/mathematics_dataset
OlympiadBench-Math "olympiadbench/OE_TO_maths_en_COMP" 675 🎯 dart/data/dsets 🐱 OpenBMB/OlympiadBench
TheoremQA "theoremqa" 800 🎯 dart/data/dsets 🐱 TIGER-AI-Lab/TheoremQA
Odyssey-Math "odyssey-math" 386 🎯 dart/data/dsets 🐱 protagolabs/odyssey-math
AOPS "aops" 3886 🎯 dart/data/dsets 🌐 AOPS

For other datasets, please refer to load_query_dps to add by yourself.


source

load_query_dps

 load_query_dps (dataset:str|list[str]='math-test',
                 max_n_trials:int|list[int]=1,
                 min_n_corrects:int|list[int]=0,
                 prompt_template:str='alpaca', n_shots:int=-1)

Load dataset(s) as QueryDataPoints. If needed, please add datasets here following the format of the existing datasets, or specify the dataset .json path with the stem name as dataset ID.

Type Default Details
dataset str | list[str] math-test (List of) dataset ID
or path to dataset of samples with β€œquery” and β€œref_ans” fields.
Path will not use other two arguments.
max_n_trials int | list[int] 1 (List of) maximum number of raw responses to be generated for each dataset.
Non-positive value or None means no limit.
min_n_corrects int | list[int] 0 (List of) minimum number of correct responses to be generated for each dataset.
Non-positive value or None means no limit.
prompt_template str alpaca ID / Path of the prompt template.
n_shots int -1
Returns list QueryDataPoint to be input to dart.gen.gen.

Unified Data Templates

We unify the data format across dart.


source

QueryDataPoint

 QueryDataPoint (dataset:str, query:str, ref_ans:str,
                 prompt_template:dart_math.utils.PromptTemplate='alpaca',
                 n_shots:int=-1, n_trials:int=0, n_corrects:int=0,
                 max_n_trials:int|None=None, min_n_corrects:int|None=None,
                 **kwargs:dict[str,typing.Any])

The query-level data point to generate responses with vllm using sampling_params (and evaluate with evaluator) on.

Type Default Details
dataset str The dataset name the the query belongs to. E.g. β€œmath”.
query str Raw query, without other prompt.
ref_ans str The short reference answer to the query.
prompt_template PromptTemplate alpaca The prompt template object to use.
n_shots int -1 Number of examples in the few-shot prompt. Negative means adaptive to the datasets.
n_trials int 0 Number of raw responses already generated for the query.
n_corrects int 0 Number of correct responses already generated for the query.
max_n_trials int | None None Maximum number of trials to generate a response, by default None
None or Negative means no limit.
min_n_corrects int | None None Maximum number of trials to generate a response, by default None
None or Negative means no limit.
kwargs dict Other fields to store.

source

RespSampleBase

 RespSampleBase (dataset:str, query:str, ref_ans:str, resp:str, agent:str,
                 prompt_template:str=None, ans:str=None,
                 correct:bool=None)

The response-level data point containing the query-level data point and other response-level information.

Type Default Details
dataset str The dataset name the the query belongs to.
query str The input query to generate responses on.
ref_ans str The reference answer to the query.
resp str The generated response.
agent str
prompt_template str None
ans str None The answer in the generated response, by default None
correct bool None Whether the generated response is correct, by default None

source

RespSampleVLLM

 RespSampleVLLM (dataset:str, query:str, ref_ans:str, abs_tol:float=None,
                 resp:str=None, finish_reason:str=None,
                 stop_reason:str=None, cumulative_logprob:float=None,
                 ans:str=None, correct:bool=None, **kwargs)

The response-level data point from vllm model, containg extra fields like finish_reason, stop_reason, cumulative_logprob.

Type Default Details
dataset str The dataset name the the query belongs to.
query str The input query to generate responses on.
ref_ans str The reference answer to the query.
abs_tol float None The absolute tolerance of the answer.
resp str None The generated response.
finish_reason str None The reason for finishing the generation from vllm
stop_reason str None The reason for stopping the generation from vllm, e.g. EoS token.
cumulative_logprob float None The cumulative log probability of the generated response.
ans str None The generated response.
correct bool None Whether the generated response is correct.
kwargs Other fields to store.
Back to top