Open‑Source Nihon Kohden → EDF(+) Converter: Python Tools and ExamplesConverting electrophysiological recordings from proprietary formats into open standards is a common need in clinical research, neurophysiology, and sleep medicine. Nihon Kohden’s clinical EEG/PSG file formats are widely used in many hospitals and labs, but their closed or semi-closed formats can create friction for sharing data, applying open-source analysis tools, or long-term archiving. This article explains why converting Nihon Kohden files to the EDF(+) standard is useful, outlines legal and ethical considerations, reviews open-source Python tools that can help, and provides practical examples (including code) for building a reliable converter that preserves signals, annotations, and metadata.
Why convert Nihon Kohden to EDF(+)?
- Interoperability: EDF(+) (European Data Format plus) is an open, widely supported format for storing multichannel biological signals (EEG, PSG, ECG, EMG). Converting enables interoperability with tools like MNE-Python, EEGLAB (via conversion), and many commercial and research packages.
- Long-term accessibility: Open formats reduce vendor lock-in and make long-term archiving and reuse easier.
- Reproducibility & sharing: Many journals and data repositories prefer or require open formats for reproducible research.
- Tooling: EDF(+) supports annotations and event markers, which makes downstream analysis (like sleep staging or seizure detection) simpler.
Legal & ethical considerations
- Confirm that you have the right to convert and share the data. Patient data may be protected by HIPAA, GDPR, or local laws. De-identify or anonymize patient identifiers before sharing.
- Proprietary file formats may be subject to licensing terms. Check Nihon Kohden’s user agreement for any constraints on reading or converting files.
- When publishing or sharing converted datasets, include clear provenance metadata that documents source files, conversion methods, software versions, and any de-identification steps.
Overview of Nihon Kohden formats
Nihon Kohden devices generate several file types (extensions vary by device/firmware), commonly including combinations like:
- .NKO / .NKB / .NKS / proprietary binary blocks containing sample data and header records
- Accompanying text or XML files with metadata and annotations
Because formats can vary across device models and firmware versions, a robust converter must:
- Parse the header to extract sampling rates, channel labels, calibration/gain, and timebase.
- Read multi-byte binary samples with correct endianness and sample encoding (e.g., 16-bit signed integers, 24-bit, or floats).
- Handle annotations and event markers, mapping them to EDF(+) annotations with correct timestamps.
- Preserve channel types (EEG, ECG, EMG, EOG), units (microvolts), and physical/digital scaling.
Open-source Python tools to help
Below are open-source Python libraries and utilities useful for reading proprietary files, manipulating signals, and writing EDF(+) files.
- mne (https://mne.tools): Mature toolbox for EEG/MEG analysis. Supports reading EDF/EDF+ and many other formats; provides data structures (Raw, Epochs, Events) and I/O utils.
- pyedflib (https://github.com/holgern/pyedflib): Lightweight EDF/EDF+ reader–writer in Python. Good for programmatic EDF(+) creation with control over headers and annotations.
- numpy, scipy: Core numerical libraries for handling arrays, resampling, filtering, and conversions.
- construct (https://construct.readthedocs.io): Declarative binary parsing library useful when reverse-engineering proprietary binary headers.
- pandas: Helpful for handling tabular metadata and annotations.
- h5py: If you want to intermediate store or inspect data in HDF5 during conversion.
- pySerial / vendor SDKs: Some devices or file exporters from Nihon Kohden may include SDKs or export tools; check vendor documentation for official APIs.
Note: There isn’t a single official, universal open-source reader for all Nihon Kohden variants; often you must implement file parsing based on the device and file type you have.
Design of a robust converter
A minimal converter should:
- Identify file type and version.
- Parse header metadata (patient ID, recording start time, channel list, sampling rates, gains, filters).
- Extract continuous sample data per channel, applying any required scaling to physical units.
- Extract annotations/events and map them to EDF(+) annotation format (onset, duration, description).
- Validate signal lengths and timestamps; handle dropped samples or discontinuities.
- Write EDF(+) using a tested library, ensuring correct header fields and annotation blocks.
- Optionally offer batch mode, resampling, channel selection, and de-identification.
Example workflow using pyedflib + custom reader
Below is an illustrative example showing how to structure a converter. This uses a placeholder function read_nihon_kohden(…) which you must implement or replace with device-specific parsing code. The example demonstrates how to take raw signals, channel metadata, and annotations and write them to an EDF(+) file with pyedflib.
# requirements: # pip install numpy pyedflib import numpy as np import pyedflib from datetime import datetime, timezone def read_nihon_kohden(file_path): """ Placeholder reader: implement according to your Nihon Kohden file format. Should return: - signals: list of numpy arrays (one per channel) in physical units (e.g., microvolts) - fs: list or int sampling frequency per channel - chan_labels: list of channel labels - phys_min, phys_max: lists of physical min/max for each channel - dig_min, dig_max: lists of digital min/max (typ. -32768..32767) - start_time: datetime of recording start (UTC) - annotations: list of dicts with keys {'onset': seconds_from_start, 'duration': seconds, 'description': text} """ raise NotImplementedError("Fill in Nihon Kohden file parsing here") def write_edf_plus(out_path, signals, fs, chan_labels, phys_min, phys_max, dig_min, dig_max, start_time, annotations): n_channels = len(signals) max_len = max(len(s) for s in signals) # ensure all channels have same length by padding with zeros if needed sigs = np.zeros((n_channels, max_len), dtype=np.float64) for i, s in enumerate(signals): sigs[i, :len(s)] = s f = pyedflib.EdfWriter(out_path, n_channels=n_channels, file_type=pyedflib.FILETYPE_EDFPLUS) channel_info = [] for i in range(n_channels): ch_dict = { 'label': chan_labels[i], 'dimension': 'uV', 'sample_rate': fs if isinstance(fs, int) else fs[i], 'physical_min': phys_min[i], 'physical_max': phys_max[i], 'digital_min': dig_min[i], 'digital_max': dig_max[i], 'transducer': '', 'prefilter': '' } channel_info.append(ch_dict) f.setPatientCode("") # de-identify or set patient fields as needed f.setTechnicianCode("") f.setRecordingAdditional("Converted from Nihon Kohden") # set starttime st = start_time.timetuple()[:6] f.setStartdatetime(datetime(*st, tzinfo=timezone.utc)) f.setSignalHeaders(channel_info) f.writeSamples(sigs) # write annotations for ann in annotations: onset = float(ann['onset']) duration = float(ann.get('duration', 0.0)) desc = ann.get('description', '') f.writeAnnotation(onset, duration, desc) f.close() # Example usage: if __name__ == "__main__": src = "example.nk" # replace with real file out = "converted.edf" signals, fs, chan_labels, phys_min, phys_max, dig_min, dig_max, start_time, annotations = read_nihon_kohden(src) write_edf_plus(out, signals, fs, chan_labels, phys_min, phys_max, dig_min, dig_max, start_time, annotations)
Notes:
- read_nihon_kohden must decode binary samples, apply per-channel scaling, and return arrays in physical units (microvolts).
- pyedflib expects samples in physical units and will scale to digital ranges specified.
- If channels have different sampling rates, you can either resample to a common rate or write as is by appropriately repeating samples and adjusting headers; EDF permits per-channel sample rates but many tools assume a single rate—decide based on your downstream needs.
Handling common challenges
- Variable sampling rates: If channels differ in sampling frequency, either resample (scipy.signal.resample_poly) or write EDF with per-channel sample counts and accurate sample_rate header fields. Document choices.
- Large files: Use chunked reading and streaming writes to avoid excessive memory usage. pyedflib supports writing in blocks.
- Annotations with sub-second precision: EDF(+) supports fractional-onset annotations; ensure you convert timestamps precisely.
- Missing metadata: If patient or recording metadata is missing, populate required EDF fields with placeholders and record provenance in the recording additional field.
Example: parsing a simple binary header (pattern example)
When reverse-engineering proprietary formats, such as a simplistic Nihon Kohden-like header, using construct or manual struct unpacking helps. Below is a conceptual snippet using struct for illustrative purposes only—adapt to real formats.
import struct def parse_simple_header(fp): # Example: first 64 bytes: ASCII recording start "YYYYMMDDhhmmss" hdr = fp.read(64) start_str = hdr[:14].decode('ascii') start_time = datetime.strptime(start_str, "%Y%m%d%H%M%S") # Next: channel count (2 bytes), sampling rate (4 bytes), etc. # This is illustrative: real format will differ. channel_count = struct.unpack('<H', hdr[14:16])[0] fs = struct.unpack('<I', hdr[16:20])[0] return start_time, channel_count, fs
Validation & testing
- Compare signal statistics (min/max, mean, PSD) before and after conversion to ensure fidelity.
- Check annotations and event timings on a timeline plot.
- Load converted EDF(+) in MNE or EDF viewers (Polyman, EDFbrowser) to verify channels and annotations.
- Unit test the parsing of headers and sample extraction with known test files.
Packaging and distribution
- License: Choose a permissive license (MIT, BSD) or copyleft (GPL) depending on your goals. Ensure compliance with any vendor constraints.
- CLI: Provide a simple command-line entrypoint (argparse) for batch conversion, de-identification flags, channel mapping, and logging.
- Docker: Offer a Docker image for reproducible environments.
- Tests: Include sample files (if licensing permits) or synthetic data for CI tests.
- Documentation: Provide clear instructions on supported Nihon Kohden variants and how to add new parsers.
Conclusion
Building an open-source converter from Nihon Kohden formats to EDF(+) enables interoperability, reproducibility, and long-term accessibility of physiological recordings. While vendor formats vary and require careful reverse-engineering, Python libraries like pyedflib, MNE, and binary parsers (construct/struct) provide the building blocks. Implement robust parsing, preserve metadata and annotations, validate outputs, and provide clear provenance and de-identification options to create a production-ready converter.
If you want, I can: (a) examine a sample Nihon Kohden file you provide and sketch a parser for it, (b) expand the example into a full CLI tool with resampling and batch support, or © generate unit tests and CI config for the project.
Leave a Reply