User:Rich Farmbrough/Disambig scripts
Jump to navigation
Jump to search
Scripting
Note these scripts work with the SQL databse dumps, not with the XML dumps. Create this perl script dab.pl
#!/usr/bin/perl
while (<>) {
@lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/;
foreach $line (@lines){
$line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/;
$space=$1;
$name=$2;
$text=$3;
if ($space==0) {
if ($text =~ m/\{\{disambig\}\}/){
print $name, "\n";
}
elsif ($text =~ m/\{\{msg:disambig\}\}/){
print $name, "\n";
}
}
}
}
run
perl dab.pl ddddddddd_cur_table.sql > dab.txt
Where dddddd is the appropriate date (Takes a few minutes, I didn't time it.)
then create countdab.pl
#!/usr/bin/perl
%dab=();
open (DAB,"dab.txt");
while (<DAB>){
chomp();
$dab{$_}=0;
}
$i=0;
while (<>) {
@lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/;
foreach $line (@lines){
$line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/;
if ($1==0){
$_=$3;
@links= /\[\[(.*?)(?:\||\]\])/g;
foreach $link (@links){
if ( exists $dab{$link} ) {
$dab{$link}++;
}
}
}
}
print STDERR ".",++$i;
}
$i=0;
foreach $key (sort { $dab{$b} <=> $dab{$a} } keys %dab) {
print "# [[", $key,"]] ([[Special:Whatlinkshere/",$key,"|links]] to ",$dab{$key}," articles)\n";
if ($i++>200) {last;}
}
and run with something like
perl dabcount.pl dddddddddd_cur_table.sql > count.txt
where dddddddddd is the appropriate date. (takes about twenty minutes)
and you have your result. It's not perfect because it ignores nowiki, comments etc. but for a disambiguation league table it's good enough.