from dart_math.data import *WARNING 12-10 04:54:47 _custom_ops.py:14] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')
WARNING 12-10 04:54:47 _custom_ops.py:14] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')
Preset datasets so far:
| Dataset | ID | Size | Stored At | Source |
|---|---|---|---|---|
| MATH/Test | "math/test" |
5000 | π€ HuggingFace | π€ hendrycks/competition_math |
| MATH/Train | "math/train" |
7500 | π€ HuggingFace | π€ hendrycks/competition_math |
| GSM8K/Test | "gsm8k/test" |
1319 | π€ HuggingFace | π€ gsm8k |
| GSM8K(Fixed)/Train (DEPRECATED: GSM8K/Train) |
"gsm8k-fix/train"(DEPRECATED: "gsm8k/train") |
7473 | π€ HuggingFace (DEPRECATED: π€ HuggingFace) |
π€ gsm8k |
| MWPBench/CollegeMath/Test | "mwpbench/college-math/test" |
2818 | π― dart/data/dsets | π± microsoft/unilm/mathscale/MWPBench |
| MWPBench/CollegeMath/Train | "mwpbench/college-math/train" |
1281 | π― dart/data/dsets | π± microsoft/unilm/mathscale/MWPBench |
| MWPBench/GaokaoBench | "mwpbench/gaokaobench" |
508 | π― dart/data/dsets | π± microsoft/unilm/mathscale/MWPBench |
| MWPBench/FreshGaokaoMath2023 | "mwpbench/fresh-gaokao-math-2023" |
30 | π― dart/data/dsets | π± microsoft/unilm/mathscale/MWPBench |
| DeepMind Mathematics | "deepmind-mathematics" |
1000 | π― dart/data/dsets | π± google-deepmind/mathematics_dataset |
| OlympiadBench-Math | "olympiadbench/OE_TO_maths_en_COMP" |
675 | π― dart/data/dsets | π± OpenBMB/OlympiadBench |
| TheoremQA | "theoremqa" |
800 | π― dart/data/dsets | π± TIGER-AI-Lab/TheoremQA |
| Odyssey-Math | "odyssey-math" |
386 | π― dart/data/dsets | π± protagolabs/odyssey-math |
| AOPS | "aops" |
3886 | π― dart/data/dsets | π AOPS |
For other datasets, please refer to load_query_dps to add by yourself.
load_query_dps (dataset:str|list[str]='math-test', max_n_trials:int|list[int]=1, min_n_corrects:int|list[int]=0, prompt_template:str='alpaca', n_shots:int=-1)
Load dataset(s) as QueryDataPoints. If needed, please add datasets here following the format of the existing datasets, or specify the dataset .json path with the stem name as dataset ID.
| Type | Default | Details | |
|---|---|---|---|
| dataset | str | list[str] | math-test | (List of) dataset ID or path to dataset of samples with βqueryβ and βref_ansβ fields. Path will not use other two arguments. |
| max_n_trials | int | list[int] | 1 | (List of) maximum number of raw responses to be generated for each dataset. Non-positive value or None means no limit. |
| min_n_corrects | int | list[int] | 0 | (List of) minimum number of correct responses to be generated for each dataset. Non-positive value or None means no limit. |
| prompt_template | str | alpaca | ID / Path of the prompt template. |
| n_shots | int | -1 | |
| Returns | list | QueryDataPoint to be input to dart.gen.gen. |
We unify the data format across dart.
QueryDataPoint (dataset:str, query:str, ref_ans:str, prompt_template:dart_math.utils.PromptTemplate='alpaca', n_shots:int=-1, n_trials:int=0, n_corrects:int=0, max_n_trials:int|None=None, min_n_corrects:int|None=None, **kwargs:dict[str,typing.Any])
The query-level data point to generate responses with vllm using sampling_params (and evaluate with evaluator) on.
| Type | Default | Details | |
|---|---|---|---|
| dataset | str | The dataset name the the query belongs to. E.g. βmathβ. | |
| query | str | Raw query, without other prompt. | |
| ref_ans | str | The short reference answer to the query. |
|
| prompt_template | PromptTemplate | alpaca | The prompt template object to use. |
| n_shots | int | -1 | Number of examples in the few-shot prompt. Negative means adaptive to the datasets. |
| n_trials | int | 0 | Number of raw responses already generated for the query. |
| n_corrects | int | 0 | Number of correct responses already generated for the query. |
| max_n_trials | int | None | None | Maximum number of trials to generate a response, by default NoneNone or Negative means no limit. |
| min_n_corrects | int | None | None | Maximum number of trials to generate a response, by default NoneNone or Negative means no limit. |
| kwargs | dict | Other fields to store. |
RespSampleBase (dataset:str, query:str, ref_ans:str, resp:str, agent:str, prompt_template:str=None, ans:str=None, correct:bool=None)
The response-level data point containing the query-level data point and other response-level information.
| Type | Default | Details | |
|---|---|---|---|
| dataset | str | The dataset name the the query belongs to. | |
| query | str | The input query to generate responses on. | |
| ref_ans | str | The reference answer to the query. | |
| resp | str | The generated response. | |
| agent | str | ||
| prompt_template | str | None | |
| ans | str | None | The answer in the generated response, by default None |
| correct | bool | None | Whether the generated response is correct, by default None |
RespSampleVLLM (dataset:str, query:str, ref_ans:str, abs_tol:float=None, resp:str=None, finish_reason:str=None, stop_reason:str=None, cumulative_logprob:float=None, ans:str=None, correct:bool=None, **kwargs)
The response-level data point from vllm model, containg extra fields like finish_reason, stop_reason, cumulative_logprob.
| Type | Default | Details | |
|---|---|---|---|
| dataset | str | The dataset name the the query belongs to. | |
| query | str | The input query to generate responses on. | |
| ref_ans | str | The reference answer to the query. | |
| abs_tol | float | None | The absolute tolerance of the answer. |
| resp | str | None | The generated response. |
| finish_reason | str | None | The reason for finishing the generation from vllm |
| stop_reason | str | None | The reason for stopping the generation from vllm, e.g. EoS token. |
| cumulative_logprob | float | None | The cumulative log probability of the generated response. |
| ans | str | None | The generated response. |
| correct | bool | None | Whether the generated response is correct. |
| kwargs | Other fields to store. |