User:Rich Farmbrough/Disambig scripts

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Scripting

Download the database.

Note these scripts work with the SQL databse dumps, not with the XML dumps. Create this perl script dab.pl



#!/usr/bin/perl

while (<>) {
	@lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/;
      foreach $line (@lines){
		$line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/;
		$space=$1;
		$name=$2;
		$text=$3;
            if ($space==0) {
			if ($text =~ m/\{\{disambig\}\}/){
				print $name, "\n";	
			}
			elsif ($text =~ m/\{\{msg:disambig\}\}/){
				print $name, "\n";	
			}
		}	
	}
}


run

perl dab.pl ddddddddd_cur_table.sql > dab.txt

Where dddddd is the appropriate date (Takes a few minutes, I didn't time it.)

then create countdab.pl



#!/usr/bin/perl
%dab=();
open (DAB,"dab.txt");
while (<DAB>){
	chomp();
	$dab{$_}=0;
}
$i=0;
while (<>) {
	@lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/;
      foreach $line (@lines){
		$line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/;
		if ($1==0){
			$_=$3;
			@links= /\[\[(.*?)(?:\||\]\])/g;

 			foreach $link (@links){
				if ( exists $dab{$link} ) {
					$dab{$link}++;
				}
			}
		}
	}
	
	print STDERR ".",++$i;
}
$i=0;
foreach $key (sort { $dab{$b} <=> $dab{$a} } keys %dab) {
  print "# [[", $key,"]] ([[Special:Whatlinkshere/",$key,"|links]] to ",$dab{$key}," articles)\n";
  if ($i++>200) {last;}
}


and run with something like

perl dabcount.pl dddddddddd_cur_table.sql > count.txt

where dddddddddd is the appropriate date. (takes about twenty minutes)

and you have your result. It's not perfect because it ignores nowiki, comments etc. but for a disambiguation league table it's good enough.