Data Loading And Comparison#
Load, save, summarize, and compare processed EEG arrays.
Classes:
| Name | Description |
|---|---|
DatasetBundle |
Container for the six package processed arrays. |
Functions:
| Name | Description |
|---|---|
find_processed_data_dir |
Find a directory containing all six processed data files. |
load_processed_arrays |
Load processed EEG arrays. |
save_processed_arrays |
Save a processed data bundle in the canonical six-file layout. |
sha256_file |
Compute a SHA256 hash for a file. |
summarize_processed_dir |
Summarize processed arrays for manifests and provenance reports. |
compare_processed_dirs |
Compare generated processed arrays with a reference directory. |
write_json_report |
Write a JSON report with stable formatting. |
DatasetBundle
dataclass
#
DatasetBundle(
X_train_valid: ndarray,
y_train_valid: ndarray,
person_train_valid: ndarray,
X_test: ndarray,
y_test: ndarray,
person_test: ndarray,
)
Container for the six package processed arrays.
Attributes:
| Name | Type | Description |
|---|---|---|
X_train_valid |
ndarray
|
Training/validation EEG trials with shape |
y_train_valid |
ndarray
|
BCI cue labels for |
person_train_valid |
ndarray
|
Subject IDs for |
X_test |
ndarray
|
Held-out EEG trials with shape |
y_test |
ndarray
|
BCI cue labels for |
person_test |
ndarray
|
Subject IDs for |
Methods:
| Name | Description |
|---|---|
as_dict |
Return arrays keyed by their canonical |
find_processed_data_dir
#
Find a directory containing all six processed data files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
candidates
|
tuple[Path, ...]
|
Candidate directories to inspect in order. |
LOCAL_CACHE_CANDIDATES
|
Returns:
| Type | Description |
|---|---|
Path
|
First directory that contains every canonical processed |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If no candidate contains the complete processed dataset. |
load_processed_arrays
#
load_processed_arrays(
data_dir: Path | str | None = None,
mmap_mode: str | None = None,
) -> DatasetBundle
Load processed EEG arrays.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
Path | str | None
|
Directory containing the six |
None
|
mmap_mode
|
str | None
|
Optional NumPy memory-map mode such as |
None
|
Returns:
| Type | Description |
|---|---|
DatasetBundle
|
A |
save_processed_arrays
#
save_processed_arrays(
bundle: DatasetBundle, output_dir: Path | str
) -> None
Save a processed data bundle in the canonical six-file layout.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle
|
DatasetBundle
|
Arrays to save. |
required |
output_dir
|
Path | str
|
Destination directory. |
required |
sha256_file
#
summarize_processed_dir
#
Summarize processed arrays for manifests and provenance reports.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_dir
|
Path | str
|
Directory containing processed |
required |
include_hashes
|
bool
|
Whether to compute SHA256 hashes for each file. |
True
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
JSON-serializable manifest with source URLs, shapes, dtypes, counts, and hashes. |
compare_processed_dirs
#
compare_processed_dirs(
generated_dir: Path | str,
reference_dir: Path | str,
atol: float = 1e-08,
rtol: float = 1e-05,
) -> dict[str, Any]
Compare generated processed arrays with a reference directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
generated_dir
|
Path | str
|
Directory containing regenerated |
required |
reference_dir
|
Path | str
|
Directory containing reference/cache |
required |
atol
|
float
|
Absolute tolerance for floating-point comparisons. |
1e-08
|
rtol
|
float
|
Relative tolerance for floating-point comparisons. |
1e-05
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
JSON-serializable comparison report with hashes, shapes, counts, and numeric |
dict[str, Any]
|
difference metrics. |