Stefan Majewsky
a709995cef
|
2 years ago | |
---|---|---|
.github/workflows | 4 years ago | |
data | 3 years ago | |
examples | 2 years ago | |
jmdict-enums | 3 years ago | |
jmdict-traverse | 3 years ago | |
src | 4 years ago | |
.gitignore | 4 years ago | |
CHANGELOG.md | 3 years ago | |
CONTRIBUTING.md | 4 years ago | |
Cargo.toml | 3 years ago | |
LICENSE | 4 years ago | |
README.md | 4 years ago | |
build.rs | 4 years ago |
README.md
WARNING: Licensing on database files
The database files compiled into the crate are licensed from the Electronic Dictionary Research and Development Group under Creative Commons licenses. Applications linking this crate directly oder indirectly must display appropriate copyright notices. Please refer to the EDRDG's license statement for details.
rust-jmdict
The jmdict
crate contains the data from the JMDict file, a comprehensive
multilingual dictionary of the Japanese language. The original JMDict file, included in this repository (and hence, in
releases of this crate) comes as XML. Instead of stuffing the XML in the binary directly, this crate parses the XML at
compile-time and generates an optimized representation for inclusion in the final binary.
In short, this crate does:
- parse the XML structure of the JMdict database file,
- provide an API to access its entries, and
- provide compile-time flags (via Cargo features) to select the amount of information included in the binary.
This crate does NOT:
- provide fast lookup into the database. You get a list of entries and then you can build your own indexing on top as required by your application.
For specific examples, please check out the documentation on docs.rs.
Building
When packaging to crates.io, we cannot include the actual payload data (data/entrypack.json
) because crates.io has a
limit of 10 MiB per crate. (Technically, we could ship the data by depending on a series of data crates each slightly
under 10 MiB, but I intend to be a good citizen and not abuse the shared infrastructure of crates.io needlessly.)
Hence the default strategy is to pull the entrypack (a preprocessed form of the JMdict contents) at build time from a server under the crate owner's control, currently https://dl.xyrillian.de/jmdict/. Each released crate version will have the most recent entrypack (as of the time of publication) hardcoded into its code, along with a SHA-256 checksum to ensure data integrity.
If downloading the entrypack at build time is not possible (e.g. because the build machine does not have internet
access, or because curl
is not installed on the build machine), download the entrypack beforehand and put its path in
the RUST_JMDICT_ENTRYPACK
environment variable when running cargo build
.
For development purposes, when building from the repository, data/entrypack.json
will be used instead. If this is not
desired, set the value of the RUST_JMDICT_ENTRYPACK
to default
to force the normal download behavior.
Contributing
If you plan to open issues or write code, please have a look at CONTRIBUTING.md.