from dart_math.data import *
WARNING 12-10 04:54:47 _custom_ops.py:14] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')
WARNING 12-10 04:54:47 _custom_ops.py:14] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')
Preset datasets so far:
Dataset | ID | Size | Stored At | Source |
---|---|---|---|---|
MATH/Test | "math/test" |
5000 | π€ HuggingFace | π€ hendrycks/competition_math |
MATH/Train | "math/train" |
7500 | π€ HuggingFace | π€ hendrycks/competition_math |
GSM8K/Test | "gsm8k/test" |
1319 | π€ HuggingFace | π€ gsm8k |
GSM8K(Fixed)/Train (DEPRECATED: GSM8K/Train) |
"gsm8k-fix/train" (DEPRECATED: "gsm8k/train" ) |
7473 | π€ HuggingFace (DEPRECATED: π€ HuggingFace) |
π€ gsm8k |
MWPBench/CollegeMath/Test | "mwpbench/college-math/test" |
2818 | π― dart/data/dsets | π± microsoft/unilm/mathscale/MWPBench |
MWPBench/CollegeMath/Train | "mwpbench/college-math/train" |
1281 | π― dart/data/dsets | π± microsoft/unilm/mathscale/MWPBench |
MWPBench/GaokaoBench | "mwpbench/gaokaobench" |
508 | π― dart/data/dsets | π± microsoft/unilm/mathscale/MWPBench |
MWPBench/FreshGaokaoMath2023 | "mwpbench/fresh-gaokao-math-2023" |
30 | π― dart/data/dsets | π± microsoft/unilm/mathscale/MWPBench |
DeepMind Mathematics | "deepmind-mathematics" |
1000 | π― dart/data/dsets | π± google-deepmind/mathematics_dataset |
OlympiadBench-Math | "olympiadbench/OE_TO_maths_en_COMP" |
675 | π― dart/data/dsets | π± OpenBMB/OlympiadBench |
TheoremQA | "theoremqa" |
800 | π― dart/data/dsets | π± TIGER-AI-Lab/TheoremQA |
Odyssey-Math | "odyssey-math" |
386 | π― dart/data/dsets | π± protagolabs/odyssey-math |
AOPS | "aops" |
3886 | π― dart/data/dsets | π AOPS |
For other datasets, please refer to load_query_dps
to add by yourself.
load_query_dps (dataset:str|list[str]='math-test', max_n_trials:int|list[int]=1, min_n_corrects:int|list[int]=0, prompt_template:str='alpaca', n_shots:int=-1)
Load dataset
(s) as QueryDataPoint
s. If needed, please add dataset
s here following the format of the existing datasets, or specify the dataset .json
path with the stem name as dataset ID.
Type | Default | Details | |
---|---|---|---|
dataset | str | list[str] | math-test | (List of) dataset ID or path to dataset of samples with βqueryβ and βref_ansβ fields. Path will not use other two arguments. |
max_n_trials | int | list[int] | 1 | (List of) maximum number of raw responses to be generated for each dataset. Non-positive value or None means no limit. |
min_n_corrects | int | list[int] | 0 | (List of) minimum number of correct responses to be generated for each dataset. Non-positive value or None means no limit. |
prompt_template | str | alpaca | ID / Path of the prompt template. |
n_shots | int | -1 | |
Returns | list | QueryDataPoint to be input to dart.gen.gen . |
We unify the data format across dart
.
QueryDataPoint (dataset:str, query:str, ref_ans:str, prompt_template:dart_math.utils.PromptTemplate='alpaca', n_shots:int=-1, n_trials:int=0, n_corrects:int=0, max_n_trials:int|None=None, min_n_corrects:int|None=None, **kwargs:dict[str,typing.Any])
The query-level data point to generate responses with vllm
using sampling_params
(and evaluate with evaluator
) on.
Type | Default | Details | |
---|---|---|---|
dataset | str | The dataset name the the query belongs to. E.g. βmathβ. | |
query | str | Raw query, without other prompt. | |
ref_ans | str | The short reference answer to the query . |
|
prompt_template | PromptTemplate | alpaca | The prompt template object to use. |
n_shots | int | -1 | Number of examples in the few-shot prompt. Negative means adaptive to the datasets. |
n_trials | int | 0 | Number of raw responses already generated for the query . |
n_corrects | int | 0 | Number of correct responses already generated for the query . |
max_n_trials | int | None | None | Maximum number of trials to generate a response, by default NoneNone or Negative means no limit. |
min_n_corrects | int | None | None | Maximum number of trials to generate a response, by default NoneNone or Negative means no limit. |
kwargs | dict | Other fields to store. |
RespSampleBase (dataset:str, query:str, ref_ans:str, resp:str, agent:str, prompt_template:str=None, ans:str=None, correct:bool=None)
The response-level data point containing the query-level data point and other response-level information.
Type | Default | Details | |
---|---|---|---|
dataset | str | The dataset name the the query belongs to. | |
query | str | The input query to generate responses on. | |
ref_ans | str | The reference answer to the query. | |
resp | str | The generated response. | |
agent | str | ||
prompt_template | str | None | |
ans | str | None | The answer in the generated response, by default None |
correct | bool | None | Whether the generated response is correct, by default None |
RespSampleVLLM (dataset:str, query:str, ref_ans:str, abs_tol:float=None, resp:str=None, finish_reason:str=None, stop_reason:str=None, cumulative_logprob:float=None, ans:str=None, correct:bool=None, **kwargs)
The response-level data point from vllm
model, containg extra fields like finish_reason
, stop_reason
, cumulative_logprob
.
Type | Default | Details | |
---|---|---|---|
dataset | str | The dataset name the the query belongs to. | |
query | str | The input query to generate responses on. | |
ref_ans | str | The reference answer to the query. | |
abs_tol | float | None | The absolute tolerance of the answer. |
resp | str | None | The generated response. |
finish_reason | str | None | The reason for finishing the generation from vllm |
stop_reason | str | None | The reason for stopping the generation from vllm , e.g. EoS token. |
cumulative_logprob | float | None | The cumulative log probability of the generated response. |
ans | str | None | The generated response. |
correct | bool | None | Whether the generated response is correct. |
kwargs | Other fields to store. |