Skip to content

Data Loading And Comparison#

Load, save, summarize, and compare processed EEG arrays.

Classes:

Name Description
DatasetBundle

Container for the six package processed arrays.

Functions:

Name Description
find_processed_data_dir

Find a directory containing all six processed data files.

load_processed_arrays

Load processed EEG arrays.

save_processed_arrays

Save a processed data bundle in the canonical six-file layout.

sha256_file

Compute a SHA256 hash for a file.

summarize_processed_dir

Summarize processed arrays for manifests and provenance reports.

compare_processed_dirs

Compare generated processed arrays with a reference directory.

write_json_report

Write a JSON report with stable formatting.

DatasetBundle dataclass #

DatasetBundle(
    X_train_valid: ndarray,
    y_train_valid: ndarray,
    person_train_valid: ndarray,
    X_test: ndarray,
    y_test: ndarray,
    person_test: ndarray,
)

Container for the six package processed arrays.

Attributes:

Name Type Description
X_train_valid ndarray

Training/validation EEG trials with shape (n, 22, 1000).

y_train_valid ndarray

BCI cue labels for X_train_valid, using labels 769-772.

person_train_valid ndarray

Subject IDs for X_train_valid, shaped (n, 1).

X_test ndarray

Held-out EEG trials with shape (m, 22, 1000).

y_test ndarray

BCI cue labels for X_test, using labels 769-772.

person_test ndarray

Subject IDs for X_test, shaped (m, 1).

Methods:

Name Description
as_dict

Return arrays keyed by their canonical .npy filenames.

as_dict #

as_dict() -> dict[str, ndarray]

Return arrays keyed by their canonical .npy filenames.

Returns:

Type Description
dict[str, ndarray]

Mapping from processed filename to array.

find_processed_data_dir #

find_processed_data_dir(
    candidates: tuple[Path, ...] = LOCAL_CACHE_CANDIDATES,
) -> Path

Find a directory containing all six processed data files.

Parameters:

Name Type Description Default
candidates tuple[Path, ...]

Candidate directories to inspect in order.

LOCAL_CACHE_CANDIDATES

Returns:

Type Description
Path

First directory that contains every canonical processed .npy file.

Raises:

Type Description
FileNotFoundError

If no candidate contains the complete processed dataset.

load_processed_arrays #

load_processed_arrays(
    data_dir: Path | str | None = None,
    mmap_mode: str | None = None,
) -> DatasetBundle

Load processed EEG arrays.

Parameters:

Name Type Description Default
data_dir Path | str | None

Directory containing the six .npy files. If omitted, known local cache locations are searched.

None
mmap_mode str | None

Optional NumPy memory-map mode such as "r".

None

Returns:

Type Description
DatasetBundle

A DatasetBundle with train/validation and test arrays.

save_processed_arrays #

save_processed_arrays(
    bundle: DatasetBundle, output_dir: Path | str
) -> None

Save a processed data bundle in the canonical six-file layout.

Parameters:

Name Type Description Default
bundle DatasetBundle

Arrays to save.

required
output_dir Path | str

Destination directory.

required

sha256_file #

sha256_file(
    path: Path | str, chunk_size: int = 1024 * 1024
) -> str

Compute a SHA256 hash for a file.

Parameters:

Name Type Description Default
path Path | str

File to hash.

required
chunk_size int

Number of bytes read per chunk.

1024 * 1024

Returns:

Type Description
str

Hex-encoded SHA256 digest.

summarize_processed_dir #

summarize_processed_dir(
    data_dir: Path | str, include_hashes: bool = True
) -> dict[str, Any]

Summarize processed arrays for manifests and provenance reports.

Parameters:

Name Type Description Default
data_dir Path | str

Directory containing processed .npy arrays.

required
include_hashes bool

Whether to compute SHA256 hashes for each file.

True

Returns:

Type Description
dict[str, Any]

JSON-serializable manifest with source URLs, shapes, dtypes, counts, and hashes.

compare_processed_dirs #

compare_processed_dirs(
    generated_dir: Path | str,
    reference_dir: Path | str,
    atol: float = 1e-08,
    rtol: float = 1e-05,
) -> dict[str, Any]

Compare generated processed arrays with a reference directory.

Parameters:

Name Type Description Default
generated_dir Path | str

Directory containing regenerated .npy files.

required
reference_dir Path | str

Directory containing reference/cache .npy files.

required
atol float

Absolute tolerance for floating-point comparisons.

1e-08
rtol float

Relative tolerance for floating-point comparisons.

1e-05

Returns:

Type Description
dict[str, Any]

JSON-serializable comparison report with hashes, shapes, counts, and numeric

dict[str, Any]

difference metrics.

write_json_report #

write_json_report(
    report: dict[str, Any], output_path: Path | str
) -> None

Write a JSON report with stable formatting.

Parameters:

Name Type Description Default
report dict[str, Any]

JSON-serializable report object.

required
output_path Path | str

Destination JSON path.

required