Skip to content
This repository was archived by the owner on Nov 25, 2019. It is now read-only.

problem when compiling Turkish wiktionary #24

Open
itkach opened this issue Jan 30, 2013 · 2 comments
Open

problem when compiling Turkish wiktionary #24

itkach opened this issue Jan 30, 2013 · 2 comments

Comments

@itkach
Copy link
Member

itkach commented Jan 30, 2013

Copied from aarddict/desktop#26
Hello
OS: Ubuntu 12.04 i386
After compiling tr.wikrionary.org there are no word definitions in articles, only information like language, tense, etc.
Example:
Original article:

ffordd
Galce
Ad
Anlamlar
[1] yol

My case (from aarddict):

ffordd
Galce
Ad

as you see, there is no translation
I've followed instructions here http://aarddict.org/aardtools/doc/aardtools.html, except that

  1. installed libicu48 instead of libicu38
  2. had to give executable permission to env-aard/bin/activate
  3. simplewiki-20101026-pages-articles.cdb is not a file, but a folder
  4. had lots of these messages during aardc wiki ... execution:
.../env-aard/local/lib/python2.7/site-packages/aardtools/mwaardhtmlwriter.py:356: FutureWarning: The behavior of this method will change in future versions.  Use specific 'len(elem)' or 'elem is not None' test instead.
  not (element.getchildren() or element.text or element.tail) and parent):

thank you

@itkach
Copy link
Member Author

itkach commented Jan 30, 2013

@microspace

After compiling tr.wikrionary.org there are no word definitions in articles, only information like language, tense, etc.

Current version of aardtools filters out navigational cruft, mostly based on enwiki and few other big wikis, which may not be suitable for other types of wikis (take a look aardtools/mwaardhtmlwriter.py to see what's excluded). I made some changes to compile enwiktionary, although this is a hack and won't be merged.

The proper way to fix this will be available after I merge #21 which basically fixes #11 and enables creating individual filter sets per wikipedia, outside of the code (sorry @doozan it's taking so long - I almost have it done, just need to find time to clean up and release).

2) had to give executable permission to env-aard/bin/activate

Take that up with virtualenv, not part of this project

3) simplewiki-20101026-pages-articles.cdb is not a file, but a folder

I don't see where it is claimed to be a file (and technically directories are files anyway)

4) had lots of these messages during aardc wiki ... execution:

#13, I believe this is fixed in a9b192a

@itkach
Copy link
Member Author

itkach commented Mar 6, 2013

It looks like content filtering is not an issue - in latest version of aardtools nothing is filtered out by default, yet translations are still missing. Other wiktionaries compile fine though, it looks like there's something special about trwiktionary that mwlib doesn't properly handle.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant