Rust library for access to the JMdict
Find a file
2023-07-25 10:38:49 -07:00
.github/workflows reset history 2021-04-18 14:13:37 +02:00
data bump JMdict to 2023-07-25 2023-07-25 10:38:49 -07:00
examples kanji_info example: simplify using .inspect() 2023-01-06 15:52:45 +01:00
jmdict-enums bump JMdict to 2023-07-25 2023-07-25 10:38:49 -07:00
jmdict-traverse release v2.0.0 2021-07-19 19:05:34 +02:00
src shave 4 bytes off the Gloss and LoanwordSource representations 2021-04-18 14:33:13 +02:00
.gitignore reset history 2021-04-18 14:13:37 +02:00
build.rs shave 4 bytes off the Gloss and LoanwordSource representations 2021-04-18 14:33:13 +02:00
Cargo.toml release v2.0.0 2021-07-19 19:05:34 +02:00
CHANGELOG.md mark all enums as non-exhaustive 2021-07-19 19:16:15 +02:00
CONTRIBUTING.md add CHANGELOG.md 2021-04-18 17:08:07 +02:00
LICENSE reset history 2021-04-18 14:13:37 +02:00
README.md reset history 2021-04-18 14:13:37 +02:00

WARNING: Licensing on database files

The database files compiled into the crate are licensed from the Electronic Dictionary Research and Development Group under Creative Commons licenses. Applications linking this crate directly oder indirectly must display appropriate copyright notices. Please refer to the EDRDG's license statement for details.

rust-jmdict

GitHub Actions Badge

The jmdict crate contains the data from the JMDict file, a comprehensive multilingual dictionary of the Japanese language. The original JMDict file, included in this repository (and hence, in releases of this crate) comes as XML. Instead of stuffing the XML in the binary directly, this crate parses the XML at compile-time and generates an optimized representation for inclusion in the final binary.

In short, this crate does:

  • parse the XML structure of the JMdict database file,
  • provide an API to access its entries, and
  • provide compile-time flags (via Cargo features) to select the amount of information included in the binary.

This crate does NOT:

  • provide fast lookup into the database. You get a list of entries and then you can build your own indexing on top as required by your application.

For specific examples, please check out the documentation on docs.rs.

Building

When packaging to crates.io, we cannot include the actual payload data (data/entrypack.json) because crates.io has a limit of 10 MiB per crate. (Technically, we could ship the data by depending on a series of data crates each slightly under 10 MiB, but I intend to be a good citizen and not abuse the shared infrastructure of crates.io needlessly.)

Hence the default strategy is to pull the entrypack (a preprocessed form of the JMdict contents) at build time from a server under the crate owner's control, currently https://dl.xyrillian.de/jmdict/. Each released crate version will have the most recent entrypack (as of the time of publication) hardcoded into its code, along with a SHA-256 checksum to ensure data integrity.

If downloading the entrypack at build time is not possible (e.g. because the build machine does not have internet access, or because curl is not installed on the build machine), download the entrypack beforehand and put its path in the RUST_JMDICT_ENTRYPACK environment variable when running cargo build.

For development purposes, when building from the repository, data/entrypack.json will be used instead. If this is not desired, set the value of the RUST_JMDICT_ENTRYPACK to default to force the normal download behavior.

Contributing

If you plan to open issues or write code, please have a look at CONTRIBUTING.md.