This fetches Special:Export pages with perl/LWP.
Contents |
If you have a list of pages, you're done.
The UNESCO OER wiki is not up to date, so category pages cannot be retrieved via the API, but need to be supplied manually. One option is to use perl/mechanize to click the 'Add Pages in Category' button, and then to retrieve. I might add a recipe for this.
With more recent installs of mediawiki, a list of pages within a category can be determined via the api:
api.php?action=query&list=categorymembers&cmtitle=Category:Access2OER
This is easily accomplished using MediaWiki::API:
use MediaWiki::API; my $mw = MediaWiki::API->new(); $mw->{config}->{api_url} = 'http://.../api.php'; $mw->{config}->{on_error} = \&on_error; sub on_error { print "Error code: " . $mw->{error}->{code} . "\n"; print $mw->{error}->{stacktrace}."\n"; die; }; # get a list of articles in category my $articles = $mw->list ( { action => 'query', list => 'categorymembers', cmtitle => 'Category:Access2OER', cmlimit => 'max' } ) || die $mw->{error}->{code} . ': ' . $mw->{error}->{details}; # and print the article titles foreach (@{$articles}) { print "$_->{title}\n"; }
You can determine the templates in use on a particular page as follows:
api.php?action=query&prop=templates&titles=Main%20Page
It's possible to determine subpages using the api with apprefix. E.g. get all pages starting with 'Tutorials/' (i.e. proper subpages on Tutorials):
action=query&list=allpages&aplimit=100&apprefix=Tutorials/
You'd also need to add the 'Tutorials' page itself to the list. The above query won't catch the 'Tutorials' page itself.
When you have your list of pages, the following script gets them:
#!/path/to/perl use strict; use LWP::UserAgent; use HTTP::Request::Common; my $myurl = "http://oerwiki.iiep-unesco.org/index.php?title=Special:Export"; my $pages; while (<STDIN>) { $pages .= $_; }; my %formfields = ( "pages" => $pages, "curonly" => "true", "action" => "submit", "submit" => "Export" ); my $ua = new LWP::UserAgent; $ua->protocols_allowed( [ 'http'] ); my $page = $ua->request(POST $myurl,\%formfields); (my $date = `date`)=~ s/[\/\n]//g; if ($page->is_success) { open F,">Special Export $category $date.xml"; print F $page->content; close F; print "Done.\n"; } else { print $page->message; }