Open‑Source Nihon Kohden → EDF(+) Converter: Python Tools and Examples

Open‑Source Nihon Kohden → EDF(+) Converter: Python Tools and ExamplesConverting electrophysiological recordings from proprietary formats into open standards is a common need in clinical research, neurophysiology, and sleep medicine. Nihon Kohden’s clinical EEG/PSG file formats are widely used in many hospitals and labs, but their closed or semi-closed formats can create friction for sharing data, applying open-source analysis tools, or long-term archiving. This article explains why converting Nihon Kohden files to the EDF(+) standard is useful, outlines legal and ethical considerations, reviews open-source Python tools that can help, and provides practical examples (including code) for building a reliable converter that preserves signals, annotations, and metadata.


Why convert Nihon Kohden to EDF(+)?

  • Interoperability: EDF(+) (European Data Format plus) is an open, widely supported format for storing multichannel biological signals (EEG, PSG, ECG, EMG). Converting enables interoperability with tools like MNE-Python, EEGLAB (via conversion), and many commercial and research packages.
  • Long-term accessibility: Open formats reduce vendor lock-in and make long-term archiving and reuse easier.
  • Reproducibility & sharing: Many journals and data repositories prefer or require open formats for reproducible research.
  • Tooling: EDF(+) supports annotations and event markers, which makes downstream analysis (like sleep staging or seizure detection) simpler.

  • Confirm that you have the right to convert and share the data. Patient data may be protected by HIPAA, GDPR, or local laws. De-identify or anonymize patient identifiers before sharing.
  • Proprietary file formats may be subject to licensing terms. Check Nihon Kohden’s user agreement for any constraints on reading or converting files.
  • When publishing or sharing converted datasets, include clear provenance metadata that documents source files, conversion methods, software versions, and any de-identification steps.

Overview of Nihon Kohden formats

Nihon Kohden devices generate several file types (extensions vary by device/firmware), commonly including combinations like:

  • .NKO / .NKB / .NKS / proprietary binary blocks containing sample data and header records
  • Accompanying text or XML files with metadata and annotations

Because formats can vary across device models and firmware versions, a robust converter must:

  • Parse the header to extract sampling rates, channel labels, calibration/gain, and timebase.
  • Read multi-byte binary samples with correct endianness and sample encoding (e.g., 16-bit signed integers, 24-bit, or floats).
  • Handle annotations and event markers, mapping them to EDF(+) annotations with correct timestamps.
  • Preserve channel types (EEG, ECG, EMG, EOG), units (microvolts), and physical/digital scaling.

Open-source Python tools to help

Below are open-source Python libraries and utilities useful for reading proprietary files, manipulating signals, and writing EDF(+) files.

  • mne (https://mne.tools): Mature toolbox for EEG/MEG analysis. Supports reading EDF/EDF+ and many other formats; provides data structures (Raw, Epochs, Events) and I/O utils.
  • pyedflib (https://github.com/holgern/pyedflib): Lightweight EDF/EDF+ reader–writer in Python. Good for programmatic EDF(+) creation with control over headers and annotations.
  • numpy, scipy: Core numerical libraries for handling arrays, resampling, filtering, and conversions.
  • construct (https://construct.readthedocs.io): Declarative binary parsing library useful when reverse-engineering proprietary binary headers.
  • pandas: Helpful for handling tabular metadata and annotations.
  • h5py: If you want to intermediate store or inspect data in HDF5 during conversion.
  • pySerial / vendor SDKs: Some devices or file exporters from Nihon Kohden may include SDKs or export tools; check vendor documentation for official APIs.

Note: There isn’t a single official, universal open-source reader for all Nihon Kohden variants; often you must implement file parsing based on the device and file type you have.


Design of a robust converter

A minimal converter should:

  1. Identify file type and version.
  2. Parse header metadata (patient ID, recording start time, channel list, sampling rates, gains, filters).
  3. Extract continuous sample data per channel, applying any required scaling to physical units.
  4. Extract annotations/events and map them to EDF(+) annotation format (onset, duration, description).
  5. Validate signal lengths and timestamps; handle dropped samples or discontinuities.
  6. Write EDF(+) using a tested library, ensuring correct header fields and annotation blocks.
  7. Optionally offer batch mode, resampling, channel selection, and de-identification.

Example workflow using pyedflib + custom reader

Below is an illustrative example showing how to structure a converter. This uses a placeholder function read_nihon_kohden(…) which you must implement or replace with device-specific parsing code. The example demonstrates how to take raw signals, channel metadata, and annotations and write them to an EDF(+) file with pyedflib.

# requirements: # pip install numpy pyedflib import numpy as np import pyedflib from datetime import datetime, timezone def read_nihon_kohden(file_path):     """     Placeholder reader: implement according to your Nihon Kohden file format.     Should return:       - signals: list of numpy arrays (one per channel) in physical units (e.g., microvolts)       - fs: list or int sampling frequency per channel       - chan_labels: list of channel labels       - phys_min, phys_max: lists of physical min/max for each channel       - dig_min, dig_max: lists of digital min/max (typ. -32768..32767)       - start_time: datetime of recording start (UTC)       - annotations: list of dicts with keys {'onset': seconds_from_start, 'duration': seconds, 'description': text}     """     raise NotImplementedError("Fill in Nihon Kohden file parsing here") def write_edf_plus(out_path, signals, fs, chan_labels, phys_min, phys_max,                    dig_min, dig_max, start_time, annotations):     n_channels = len(signals)     max_len = max(len(s) for s in signals)     # ensure all channels have same length by padding with zeros if needed     sigs = np.zeros((n_channels, max_len), dtype=np.float64)     for i, s in enumerate(signals):         sigs[i, :len(s)] = s     f = pyedflib.EdfWriter(out_path, n_channels=n_channels, file_type=pyedflib.FILETYPE_EDFPLUS)     channel_info = []     for i in range(n_channels):         ch_dict = {             'label': chan_labels[i],             'dimension': 'uV',             'sample_rate': fs if isinstance(fs, int) else fs[i],             'physical_min': phys_min[i],             'physical_max': phys_max[i],             'digital_min': dig_min[i],             'digital_max': dig_max[i],             'transducer': '',             'prefilter': ''         }         channel_info.append(ch_dict)     f.setPatientCode("")  # de-identify or set patient fields as needed     f.setTechnicianCode("")     f.setRecordingAdditional("Converted from Nihon Kohden")     # set starttime     st = start_time.timetuple()[:6]     f.setStartdatetime(datetime(*st, tzinfo=timezone.utc))     f.setSignalHeaders(channel_info)     f.writeSamples(sigs)     # write annotations     for ann in annotations:         onset = float(ann['onset'])         duration = float(ann.get('duration', 0.0))         desc = ann.get('description', '')         f.writeAnnotation(onset, duration, desc)     f.close() # Example usage: if __name__ == "__main__":     src = "example.nk"  # replace with real file     out = "converted.edf"     signals, fs, chan_labels, phys_min, phys_max, dig_min, dig_max, start_time, annotations = read_nihon_kohden(src)     write_edf_plus(out, signals, fs, chan_labels, phys_min, phys_max, dig_min, dig_max, start_time, annotations) 

Notes:

  • read_nihon_kohden must decode binary samples, apply per-channel scaling, and return arrays in physical units (microvolts).
  • pyedflib expects samples in physical units and will scale to digital ranges specified.
  • If channels have different sampling rates, you can either resample to a common rate or write as is by appropriately repeating samples and adjusting headers; EDF permits per-channel sample rates but many tools assume a single rate—decide based on your downstream needs.

Handling common challenges

  • Variable sampling rates: If channels differ in sampling frequency, either resample (scipy.signal.resample_poly) or write EDF with per-channel sample counts and accurate sample_rate header fields. Document choices.
  • Large files: Use chunked reading and streaming writes to avoid excessive memory usage. pyedflib supports writing in blocks.
  • Annotations with sub-second precision: EDF(+) supports fractional-onset annotations; ensure you convert timestamps precisely.
  • Missing metadata: If patient or recording metadata is missing, populate required EDF fields with placeholders and record provenance in the recording additional field.

Example: parsing a simple binary header (pattern example)

When reverse-engineering proprietary formats, such as a simplistic Nihon Kohden-like header, using construct or manual struct unpacking helps. Below is a conceptual snippet using struct for illustrative purposes only—adapt to real formats.

import struct def parse_simple_header(fp):     # Example: first 64 bytes: ASCII recording start "YYYYMMDDhhmmss"     hdr = fp.read(64)     start_str = hdr[:14].decode('ascii')     start_time = datetime.strptime(start_str, "%Y%m%d%H%M%S")     # Next: channel count (2 bytes), sampling rate (4 bytes), etc.     # This is illustrative: real format will differ.     channel_count = struct.unpack('<H', hdr[14:16])[0]     fs = struct.unpack('<I', hdr[16:20])[0]     return start_time, channel_count, fs 

Validation & testing

  • Compare signal statistics (min/max, mean, PSD) before and after conversion to ensure fidelity.
  • Check annotations and event timings on a timeline plot.
  • Load converted EDF(+) in MNE or EDF viewers (Polyman, EDFbrowser) to verify channels and annotations.
  • Unit test the parsing of headers and sample extraction with known test files.

Packaging and distribution

  • License: Choose a permissive license (MIT, BSD) or copyleft (GPL) depending on your goals. Ensure compliance with any vendor constraints.
  • CLI: Provide a simple command-line entrypoint (argparse) for batch conversion, de-identification flags, channel mapping, and logging.
  • Docker: Offer a Docker image for reproducible environments.
  • Tests: Include sample files (if licensing permits) or synthetic data for CI tests.
  • Documentation: Provide clear instructions on supported Nihon Kohden variants and how to add new parsers.

Conclusion

Building an open-source converter from Nihon Kohden formats to EDF(+) enables interoperability, reproducibility, and long-term accessibility of physiological recordings. While vendor formats vary and require careful reverse-engineering, Python libraries like pyedflib, MNE, and binary parsers (construct/struct) provide the building blocks. Implement robust parsing, preserve metadata and annotations, validate outputs, and provide clear provenance and de-identification options to create a production-ready converter.

If you want, I can: (a) examine a sample Nihon Kohden file you provide and sketch a parser for it, (b) expand the example into a full CLI tool with resampling and batch support, or © generate unit tests and CI config for the project.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *