Codebase list virt-viewer / 99f960b
po: minimize & canonicalize translations stored in git Similar to the virt-viewer.pot, .po files contain line numbers and file names identifying where in the source a translatable string comes from. The source locations in the .po files are thrown away and replaced with content from the virt-viewer.pot whenever msgmerge is run, so this is not precious information that needs to be stored in git. When msgmerge processes a .po file, it will add in any msgids from the virt-viewer.pot that were not already present. Thus, if a particular msgid currently has no translation, it can be considered redundant and again does not need storing in git. When msgmerge processes a .po file and can't find an exact existing translation match, it will try todo fuzzy matching instead, marking such entries with a "# fuzzy" comment to alert the translator to take a look and either discard, edit or accept the match. Looking at the existing fuzzy matches in .po files shows that the quality is awful, with many having a completely different set of printf format specifiers between the msgid and fuzzy msgstr entry. Fortunately when msgfmt generates the .gmo, the fuzzy entries are all ignored anyway. The fuzzy entries could be useful to translators if they were working on the .po files directly from git, but Virt-Viewer outsourced translation to the Fedora Zanata system, so keeping fuzzy matches in git is not much help. Finally, by default msgids are sorted based on source location. Thus, if a bit of code with translatable text is moved from one file to another, it may shift around in the .po file, despite the msgid not itself changing. If the msgids were sorted alphabetically, the .po files would have stable ordering when code is refactored. This patch takes advantage of the above observations to canonicalize and minimize the content stored for .po files in git. Instead of storing the real .po files, we now store .mini.po files. The .mini.po files are the same file format as .po files, but have no source location comments, are sorted alphabetically, and all fuzzy msgstrs and msgids with no translation are discarded. This cuts the size of content in the po directory. Users working from a virt-viewer git checkout who need the full .po files can run "make update-po", which merges the virt-viewer.pot and .mini.po file to create a .po file containing all the content previously stored in git. Conversely if a full .po file has been modified, for example, by downloading new content from Zanata, the .mini.po files can be updated by running "make update-mini-po". The resulting diffs of the .mini.po file will clearly show the changed translations without any of the noise that previously obscured content. Being able to see content changes clearly actually identified a bug in the zanata python client where it was adding bogus "fuzzy" annotations to many messages: Users working from virt-viewer releases should not see any difference in behaviour, since the tarballs only contain the full .po files, not the .mini.po files. As an added benefit, generating tarballs with "make dist", will no longer cause creation of dirty files in git, since it won't touch the .mini.po files, only the .po files which are no longer kept in git. The languages are minimized in the following commit since it is a large mechanical process. Signed-off-by: Daniel P. Berrangé <> Daniel P. Berrangé 5 years ago
3 changed file(s) with 101 addition(s) and 24 deletion(s). Raw diff Collapse all Expand all
0 #!/usr/bin/perl
2 my @block;
3 my $msgstr = 0;
4 my $empty = 0;
5 my $unused = 0;
6 my $fuzzy = 0;
7 while (<>) {
8 if (/^$/) {
9 if (!$empty && !$unused && !$fuzzy) {
10 print @block;
11 }
12 @block = ();
13 $msgstr = 0;
14 $fuzzy = 0;
15 push @block, $_;
16 } else {
17 if (/^msgstr/) {
18 $msgstr = 1;
19 $empty = 1;
20 }
21 if (/^#.*fuzzy/) {
22 $fuzzy = 1;
23 }
24 if (/^#~ msgstr/) {
25 $unused = 1;
26 }
27 if ($msgstr && /".+"/) {
28 $empty = 0;
29 }
30 push @block, $_;
31 }
32 }
34 if (@block && !$empty && !$unused) {
35 print @block;
36 }
11 COPYRIGHT_HOLDER = The Virt Viewer authors
66 LANGS := \
77 af am anp ar as ast bal be bg \
4747 -e "s|Copyright (C) YEAR|Copyright (C) $$(date +'%Y')|" \
4848 $(NULL)
51 # Although they're in EXTRA_DIST, we still need to
52 # copy these again, because update-gmo will change
54 # their content, and dist-hook runs after the
55 # things in EXTRA_DIST are copied.
56 dist-hook: $(GMOFILES)
57 cp -f $(POTFILE) $(distdir)/
58 cp -f $(POFILES) $(distdir)/
59 cp -f $(GMOFILES) $(distdir)/
50 all: update-po
6152 update-po: $(POFILES)
6354 update-gmo: $(GMOFILES)
56 update-mini-po: $(POTFILE)
57 for lang in $(LANGS); do \
58 echo "Minimizing $$lang content" && \
59 $(MSGMERGE) --no-location --no-fuzzy-matching --sort-output \
60 $(srcdir)/$$lang.po $(POTFILE) | \
61 $(SED) $(SED_PO_FIXUP_ARGS) | \
62 $(PERL) $(top_srcdir)/build-aux/ > \
63 $(srcdir)/$$ ; \
64 done
6566 push-pot: $(POTFILE)
6667 zanata push --push-type=source
6869 pull-po: $(POTFILE)
6970 zanata pull --create-skeletons
70 $(MAKE) update-po
71 $(MAKE) update-mini-po
7172 $(MAKE) update-gmo
7677 $(SED) $(SED_PO_FIXUP_ARGS) < $@-t > $@
7778 rm -f $@-t
79 $(srcdir)/%.po: $(POTFILE)
80 $(MSGMERGE) --backup=off --no-fuzzy-matching --update $@ $(POTFILE)
80 $(srcdir)/%.po: $(srcdir)/ $(POTFILE)
81 $(MSGMERGE) --no-fuzzy-matching $< $(POTFILE) | \
82 $(SED) $(SED_PO_FIXUP_ARGS) > $@
8284 $(srcdir)/ $(srcdir)/%.po
8385 rm -f $@ $@-t
66 Source repository
77 =================
9 The virt-viewer GIT repository stores the master "virt-viewer.pot" file and
10 full "po" files for translations. The master "virt-viewer.pot" file can be
11 re-generated using
9 The virt-viewer GIT repository does NOT store the master "virt-viewer.pot"
10 file, nor does it store full "po" files for translations. The master
11 "virt-viewer.pot" file can be generated at any time using
1313 make virt-viewer.pot
15 The full po files can have their source locations and msgids updated using
15 The translations are kept in minimized files that are the same file format
16 as normal po files but with all redundant information stripped and messages
17 re-ordered. The key differences between the ".mini.po" files in GIT and the
18 full ".po" files are
20 - msgids with no current translation are omitted
21 - msgids are sorted in alphabetical order not source file order
22 - msgids with a msgstr marked "fuzzy" are discarded
23 - source file locations are omitted
25 The full po files can be created at any time using
1727 make update-po
19 Normally these updates are only done when either refreshing translations from
20 Zanata, or when creating a new release.
29 This merges the "virt-viewer.pot" with the "$" for each language,
30 to create the "$LANG.po" files. These are included in the release archives
31 created by "make dist".
33 When a full po file is updated, changes can be propagated back into the
34 minimized po files using
36 make update-mini-po
38 Note, however, that this is generally not something that should be run by
39 developers normally, as it is triggered by 'make pull-po' when refreshing
40 content from Zanata.
2243 Zanata web service
2344 ==================
3051 As such, changes to translations will generally NOT be accepted as patches
3152 directly to virt-viewer GIT. Any changes made to "$" files in
32 virt-viewer GIT will be overwritten and lost the next time content is imported
33 from Zanata.
53 virt-viewer GIT will be overwritten and lost the next time content is
54 imported from Zanata.
3556 The master "virt-viewer.pot" file is periodically pushed to Zanata to provide
36 the translation team with content changes. New translated text is then
37 periodically pulled down from Zanata to update the po files.
57 the translation team with content changes, using
59 make push-pot
61 New translated text is then periodically pulled down from Zanata to update the
62 minimized po files, using
64 make pull-po
66 Sometimes the translators make mistakes, most commonly with handling printf
67 format specifiers. The "pull-po" command re-generates the .gmo files to try to
68 identify such mistakes. If a mistake is made, the broken msgstr should be
69 deleted in the local "$" file, and the Zanata web interface used
70 to reject the translation so that the broken msgstr isn't pulled down next time.
72 After pulling down new content the diff should be examined to look for any
73 obvious mistakes that are not caught automatically. There have been bugs in
74 Zanata tools which caused messges to go missing, so pay particular attention to
75 diffs showing deletions where the msgid still exists in virt-viewer.pot