Tools/en

From Kiwix
< Tools(Redirected from Tools)
Jump to: navigation, search

The Kiwix tools are a set of scripts (mostly in Perl) aiming to help creating content usable by Kiwix.

Kiwix is primarily designed as a tool to publish copies of Wikipedia, but every effort is made to ensure it would also be useful for:

As the heart of Kiwix is the HTML rendering engine Gecko, the objective of Kiwix tools is to produce:

Contents

Storage

We call such a coherent set of multimedia content a dump or a corpus. These dumps can take many forms: previous versions of Kiwix used a simple directory layout; Moulinwiki used a file compressed with bzip2 and indexed in an SQLite database.

Today, Kiwix uses the ZIM format: a single file contains the entire dump,allowing fast access, high compression and configurability.

ZIM is an open, standard format created and maintained by the openZIM project, of which Kiwix is a founding member. ZIM is itself based on an older format (Zeno). Zeno was created by the Berlin publishing house Directmedia and served for the German Wikipedia released on CD-ROM. Later, the Zeno format had been abandoned, but we wanted to continue development. The future will tell whether this initiative will be successful, but the goal is to make a standard and thus simplify the problem for each of the storage dumps. It is, anyway, already the best free solution.

Generating ZIM Files From Wikis

The question of how to generate a dump is not a simple one. For several reasons, Kiwix has so far concentrated on generating dumps offering a selection of a given Wiki site, even if the publication of complete Wikipedia dumps remains a clear objective. The Kiwix tools are designed to assist in the selection of entries, replication of content from the online site in a local mirror, and then from the mirror to a ZIM file.

But this is not the only method to generate a dump: theoretically, this can be done in different ways. Here is a small inexhaustive list of approaches:

There are certain constraints that should be taken into account. Here are the most important ones:

Prerequisites

You'll need a bunch of Perl modules to run these scripts. Here is a list of modules one tester (User:Ijon) had to install given a plain Perl 5.10 installation on Ubuntu Linux. Your mileage may vary. Install them using CPAN (perl -MCPAN -e shell), CPANPLUS (cpanp(1)), or your distro's Perl bundling mechanism.

I managed to install these by installing this subset and allowing automatic installation of dependencies:

Debian/Ubuntu dependencies

sudo apt-get install liblog-log4perl-perl libdata-dumper-simple-perl libxml-simple-perl
libxml-libxml-perl libarray-printcols-perl libgetargs-long-perl
liburi-perl libdata-dumper-simple-perl libhtml-linkextractor-perl
libhtml-parser-perl libdbd-pg-perl

Usage

Here is a list of available scripts (many of them are specific to Mediawiki):

Mediawiki Maintenance

Mirroring Tools

Dumping Tools

ZIM Generation

Virtual machine

We have prepared a VM to help people to make ZIM files from their HTML files. Download it there. Unix login/pass are root/kiwix and for postgres: postgres/kiwix. To build your ZIM file go to root/dumping_tools/scripts and use buildZimFileFromdirectory.pl.

Personal tools
Namespaces
Variants
Actions
Download(Sources)
Navigation
Toolbox
In other languages
Part of OpenZIM
Motorized by
Hosted by