Behdad Esfahbod's daily notes on GNOME, Pango, Fedora, Persian Computing, Bob Dylan, and Dan Bern!

My Photo
Location: Toronto, Ontario, Canada

Ask Google.

Contact info
Hacker Emblem Become a Friend of GNOME I Power Blogger
follow me on Twitter
July 2003
August 2003
October 2003
November 2003
December 2003
March 2004
April 2004
May 2004
July 2004
August 2004
September 2004
November 2004
March 2005
April 2005
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
July 2006
August 2006
September 2006
October 2006
November 2006
December 2006
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007
August 2007
September 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
May 2008
June 2008
July 2008
August 2008
October 2008
November 2008
December 2008
January 2009
March 2009
April 2009
May 2009
June 2009
July 2009
August 2009
November 2009
December 2009
March 2010
April 2010
May 2010
June 2010
July 2010
October 2010
November 2010
April 2011
May 2011
August 2011
September 2011
October 2011
November 2011
November 2012
June 2013
January 2014
May 2015
Current Posts
McEs, A Hacker Life
Wednesday, October 29, 2008
 Improving Login Time, Part 1: gnome-settings-daemon

Any true GNOME hacker has to take a shot (multiple shots, mind you) at improving login time. Federico did it a couple years ago, and since I want to be like Federico when I grow up, that's just what I've been doing for the past week, and expect to keep doing for the weeks that come.

How does login work anyway? In short, gnome-session is started and it then in turn reads the list of tasks to start from .desktop files. Each task is marked as belonging to one of a few login phases, in chronological order:The startup phase belongs to gnome-session itself. Initialization, then, is the first phase application can sign up to be run in. To transition to the next phase, the current phase should either complete or time out. A phase is complete when all apps associated with the phase signal completion. An app can signal completion in a variety of ways, the simplest of which being that the process terminates. A phase times out after ten seconds. Description for the phases as well as a more a longer version of this condensed overview is available in gnome-session/gnome-session/README.

The initialization phase is where actual doing stuff begins, and for this blog entry we will focus on that (to be honest, only part of it). The mentioned README has this to add about the initialization phase:
GSM_SESSION_PHASE_INITIALIZATION is the first phase of "normal" startup (ie, startup controlled by .desktop files rather than hardcoding). It covers low-level stuff like gnome-settings-daemon and at-spi-registryd, that need to be running very early (before any windows are displayed).
Before leaving the quotation, notice the emphasis: "before any windows are displayed". We'll get back to that.

The most prominent process started as part of the initialization phase is gnome-settings-daemon. No wonder, as that single module consists of some fifteen plugins that do all kinds of startup. In other words, initialization is gnome-settings-daemon. Lets dive inside.

When Jon McCann and co refactored gnome-settings-daemon into plugins, they also added profiling annotation hooks that can be used for Federico-style timeline plotting. So what I did was rebuilding g-s-d, hooking up strace, and drawing the results. All this on my home machine that is rather beefy and otherwise underutilized.

I did one thing wrong, I forgot to send the strace background, so my mock g-s-d was not terminating and hence kept gnome-session waiting for it for ten seconds and finally giving up on the phase and moving on. The net result was that g-s-d was run solo with no other process racing with it for the CPU. That, and doing warm logins, meant my timings where very predictable and consistently reproducible. In this scenario g-s-d becomes idle in just short of one second. In a more real scenario of sending the strace to the background, it takes more like 2.5s. But that's beyond the point. By letting g-s-d run as fast as it can, it's easier to spot what's slow, as those are sure to stand out.

Before getting any further, I stopped and asked myself "is this worth optimizing?" A good question to ask before any optimization work is to start. My immediate response was: "sure, if we I can cut that 1s in half, that's about 10% saving on a 5s login time". After five seconds of thinking I wondered: "but isn't g-s-d forking and returning in the parent immediately? A .5s saving in g-s-d idle time means little for the login time as a whole." And that's mostly true, but hey, aren't the initialization phase processes supposed to set things that need to be done before any windows are shown? And by forking and returning early, g-s-d is actually not doing that. I can already see how, for example, the xsettings plugin in g-s-d comes up and sets the font rendering settings of the display, causing a redraw in any Gtk+ applications already started. It would be much better to make g-s-d actually do essential initialization as fast as it can, return in the parent, then take its time doing other work that does not have to be done before windows are shown. That's what I'm planning to make it do. In light of that plan, lets see how tight or loose things currently actually are.

Without any further rambling, here is the plot of gnome-session-daemon as shipped in Fedora rawhide starting up. The only modifications I have made is adding more annotations for the plot.
Click to enlarge
After studying the plot and the underlying log and code I identified ten hotspots and plans to fix them. The hotspots are named in the plot. The names are not readable in the small version included here (click for full version), so I have also numbered them:

1. linking: This is pretty much the cost of loading 67 shared libraries that g-s-d currently directly or indirectly links to. There's not much we can do immediately to make the linker faster. We can try to link to fewer libraries however. Seems like we don't really need libgnome. Chopping that spares some 20 of that 67.

Resolution: mccann already filed Bug 557808 – don't use libgnome.

2. gtk_init: There's not much to do here right away, except that I want to profile gtk+ initialization sometime to see if we can improve it. It's not long, but any saving benefits every application, so it's worth pursuing. No resolution.

3. fontconfig_monitor: This one is so embarrassing. It's my single commit to g-s-d, and it takes the longest time in the plot. What this code does is to add gio/inotify monitors on all font directories and configuration files known by fontconfig for change notification so it can 1) rebuild the fontconfig cache and 2) signal applications to reload their font configurations.

Now, it's a good idea to make sure fontconfig cache is current before other applications start and each try to rebuild the cache, but installing the monitors can wait. It's one of those things that is equally as good if done 10s into the login.

Resolution: Only check that fontconfig cache is current. Defer installing file monitors to idle time.

4. mkfontdir: This one's so bogus. We scan two directories and cleanup symlinks for "cursor fonts" we may have created before. Then if there's any cursor font set in gconf, we create a symlink for it, then call mkfontdir (that in turn calls mkfontscale, which does a bunch of stats on nonexisting files and directories, etc) and get and set the X server font path. All that work even if there is no cursor fonts set..

Resolution: Skip spawning mkfontdir and setting the server font path if there is no cursor font set.

5. mousetweaks: This one's my favorite. According to the man page, "mousetweaks is a daemon that provides various mouse features for the GNOME desktop. It depends on the Assistive Technology Service Provider Interface (AT-SPI)." What the g-s-d plugin does is to monitor the relevant gconf keys, and start/stop the mousetweaks daemon on demand.

On a typical desktop with no tweaks configured (%99+), it spawns "mousetweaks -s", which means "stop the running daemon, if any". The mousetweaks process then starts up, initializes a bunch of stuff, including the a11y stuff (not the fastest stuff it seems), tries to find a running daemon, fail, and silently exit. So much for so little.

Resolution: Don't spawn mousetweaks if no tweaks configured. In other words, don't spawn "mousetweaks -s" unless we know a daemon is running.

6. init_kbd: This one was harder to figure out. There was no big fat thing going on. Instead, there is a look over some 20 different media keys, for each of them some gconf reading and a grab_key operation.

The gconf stuff as my plot agrees is not the bottleneck as the code already does a one-level-deep preloading on the gconf directory. The grab_key invocations however each take real time, and they add up. Looking into what grab_key does is revealing: for each combination of the ignored modifiers, for each screen, it does the usual "push error handler on display; do something with display; flush the display; pop and see if any errors happen". Multiplied by the number of keys, that's a bunch of X display flushes while we're not really interested in pass/fail status of individual operations.

Resolution: Do one "push; do; flush; pop" instead of many.

7. acme_volume_new: What's happening here is that to be able to control volume and other mixer properties, we end up initializing gstreamer. Which in turn wants to ensure that its binary cache of plugins is up to date, so it forks and stats all the plugins. Ouch!

Now, making sure the gstreamer cache is up to date before every other application starts using it is a good idea, but it doesn't have to be so painful!

Whether it's that no one has got to fix it yet, or if there's good reasons for the binary cache not to simply store the timestamps of the folders and compare that instead of doing stat on every plugin on every startup of every gstreamer-using application, I don't know. I won't judge. Love to hear the issues. But experience with Pango and fontconfig tell me that a more decent cache can be done and indeed should be done.

The fontconfig cache, admittedly, becomes really hairy at times (time skews, anyone?), but much, much, much much, better than if fontconfig stated any and every font on startup!

I also have no idea why it gstreamer forks for the cache check. I could think of not polluting the current process or risk crashing it. BUT! If the forked process fails validating the cache, it then retries in process! Oh well...

Resolution: Awesome gstreamer hackers, please fix your cache! In the mean time, forcing gstreamer to not fork may help (can be done by setting an env var).

8. gnome-screensaver: Not sure why gnome-screensaver is so heavy to start up. That's for another session. But, who cares if gnome-screensaver starts 10s into the session? Right, you got it, it does not have to be started before all other windows.

Resolution: Start at idle time.

9. clipboard_manager: This one also baffles me. It's a bunch of X roundtrips. Shouldn't take that long. Anyway, given how clipboard managers work (they are useful when the app holding the clipboard content is existing), no one would really notice if we started it 10s into the session.

Resolution: Start at idle time.

10. xrdb: The xrdb brokenness (it calls gcc!) is a well-known and well-studied issue. According to mclasen the only reason we kept doing it was xemacs, but allegedly that uses Gtk+ these days. Is there any other reason we should be doing xrdb in g-s-d in 2009?

Resolution: mccann filed Bug 557807 – disable xrdb plugin by default. Isn't he awesome?

That's tonight's ten commandments, err, resolutions. I have already started hacking the new architecture in g-s-d and patching the plugins as described above. There are more plugins, and each can use a quick remove. In general any plugin that does "set XYZ and hook up for change notifications on it" can be split up to do the change-notification hookup at idle time if that part consumes considerable time. That pretty much rounds it up for now.

Labels: , ,

I love seeing things like this, it's what makes open source so awesome. If every gnome developer could save half a second of time somewhere then think how much better gnome would be!
About starting gnome-screensaver at idle time, I saw the use-case mentioned of not using gdm, but directly starting X into the default or last user's session and immediately block with the screensaver in order to make sure the right person is sitting in front of the desktop.
About linking: would it be possible to use dlopen or such thing to defer linking to libraries that are not needed for the pre-window-display phase?
Blah. I wrote a big comment about the GStreamer registry forking, and then google ate it.

Anyway, there's something about modified plugins in a subdir not changing the mtime of the subdir, so we have to stat the plugin files themselves.

It forks to avoid loading hundreds of shared libraries into the first process that happens to run after an upgrade and bloating that processes' memory size. Ideally, it would also insulate against crashing plugins.

I have half a rewrite of the registry scanning I've been doing over the past few weeks that addresses the crashing problem and which hopefully makes things faster by not forking a sub-process unless it is needed - by stat'ing things in the parent before forking.
Your "fix" for init_kbd is a work-around and not suitable for upstream. If we do what you mention, how will we know which key is already captured?

We could do this once per config option, instead of once per key/modifier combo.
Also, I'm not sure what else acme_volume_new is doing to occupy that much time... here running gst-launch-0.10 with no arguments (which is basically gst_init then exit) takes at most around 140ms with clean caches, and at most 100ms with dirty caches.

Forking vs. not-forking seems to be around 10ms.

What sort of times do you get for just 'time gst-launch-0.10 > /dev/null 2>&1' ?
GNU Emacs now uses gtk, but it still respects xresource settings, and I believe that they're the preferred way of setting some options, eg default font and frame size. If the xrdb plugin is disabled, how would one re-enable it?
[note: contains some bikeshedding, but anyway...]

"I can already see how, for example, the xsettings plugin in g-s-d comes up and sets the font rendering settings of the display, causing a redraw in any Gtk+ applications already started."

This reminds of something I've seen a few times when booting some Ubuntu system inside Virtualbox on a slow machine: if the whole system is slow enough and Gnome startup already takes a while, it will first display the Gnome-themed icons and then (after maybe some seconds) replaces all displayed icons with the correct Tango-themed icons... If it secretly does that on every Gnome start, that would mean it wastes time loading the unused Gnome icon theme.

Re mousetweaks: have you checked why the "mousetweaks -s" call was added originally? It sounds like this would be useful to kill a running mousetweaks daemon after switching users or the like.

Re all the "Start at idle time" resolutions: won't that mean that after Gnome has started up and seems to have "quieted down", it will suddenly start working again and doing all those deferred tasks (and start churning on disk and using CPU time)? I've noticed that on KDE 3.5, when the startup sound starts playing, I know that there will be absolutely no more startup work to be done. That's a quite nice way for the computer to say "Dear user, I'm ready, and now you really have my full attention" :-)
(Agreed, deferring this stuff is still much better than running it directly during startup, but still...)
The X resources were difficult to control 15 years ago. The mechanism was broken and 'fixed' so many times (at the levels of X, Desktop and Linux distributions) that it is now totally impossible to manage properly for a normal human being.

$HOME/.Xdefaults does not even work anymore on most debian based system...

Anyways, not running xrdb during Gnome startup is probably a good thing. Just let the few peoples still using Xt applications do it manually (in their .xsession or .xprofile) if they need it

Peter, most of the setting provided by the X resources can also be set from the .emacs file

This is what I have in mine:

(if (eq window-system 'x)
(if (x-display-color-p)
(modify-frame-parameters nil '((wait-for-wm . nil))) ;; prevent slow start
(set-background-color "#FFEED0")
(set-foreground-color "black")
(set-frame-font "9x15")
(set-cursor-color "red")
(set-mouse-color "blue")
About the screensaver, we used to start it with a delay of 30 seconds, but when moving to the extensible plugins system, we moved it back to start synchornously, so just submitted a patch to start it back in an idle callback
I don't have anything against xemacs users, but from the marketing/global vision of spreading gnome to the general public it's a shame that we have a login delay in a desktop supposed to be used by million people just for giving support to an specific application used by some tech-savvy people (which is not the people the gnome desktop aims to)...
It sounds to me like many of those tasks items would benefit from being started in parallel. Stuff that round-trips to the X server and stuff that does disk I/O in particular.

Of course, that might add a lot of complexity, but most of the tasks seem to be independent.
(tried posting this before but it seems to have been eaten?)

re: init_kbd: gdk_error_trap_push/pop don't let you figure out which call failed, but g-s-d could create its own error trap and use NextRequest() and XErrorEvent->serial to match up the errors with the requests that caused them.
You rock! Will you do an analysis of a cold start also?
Thanks everyone for the very useful comments!

@fluteflute: Glad you liked it.

@Michael: Good point. For that case the screensaver daemon will be run with an argument to lock I believe, so that's a different code path and not affected by this change.

@SteveF: We'll just get rid of them. They are not needed. In fact, this is committed now.

@JanSchmidt: Thanks. I'll follow up on gstreamer-devel list.

@hadess: The code does not care if the key is already captured! Sure, it writes a warning, but doesn't do anything else. Am I missing something?

@JanSchmidt: I'm talking about 35ms here.

@Peter: Don't know off the top of my head. If we disable it, we'll make sure it can easily be enabled. I think it's as simple as editing a text file.


1) Yes, the icons changing after a few seconds sounds like the issue I speculated.

2) The "mousetweaks -s" gets called because the cod handling the initial setting of the daemon and handling changes is the same. So it doesn't know that there is no daemon running. The daemon is a per session thing, so other users or previous sessions are not relevant.

3) Well, yeah, the effect I'm seeking is that your desktop appears fully within, say, 2 seconds, then there's a few more seconds of some stuff happening in the background. Hopefully, not CPU or I/O extensive, so user does not have to notice.

@Anonymous: Thanks.

@rodrigo: I see. Bug number maybe? Not that I can't find... Being laze :).

@Nelsulo: Agreed. That's what I'm trying to fix these.

@NiklasHöglund: Parallelism would help with X round trips theoretically. But not with Xlib. We need to move to xcb, then use callbacks for request completions. That would work great, but not quite there just yet.

@DanWinship: As the code is right now, it doesn't care which requests failed (except for doing a warning). The trap is basically there to keep the error from propagating further.

@AndersNorgaard: Thanks!
Err, missed one answer.

@AndersNorgaard: Yes, I'm hoping to do cold start and start with preload too. Will post.
thank you for this very interesting work!

May I ask you what kind of tools did you use to perform such profiling?

It's a poor man's profiling tool. See:
Thanks behdad.

Would it be possible to perform such analysis with more suitable tools such as SystemTap or LTTng?

@Stefano: Not really. All-purpose profilers and analysis tools are great at telling you where you are doing work that you shouldn't be doing ("why are we calling this function so many times?", "why are we spending so much time here?", ...), but not very good at spotting stalls. At least, I don't know how to use them for such things as productively as annotating the source allows.

But then again, the source code was already annotated. All I had to do was to compile it with debugging enabled.
I'd love to see updated graphs regularly. This is great stuff :)
"Any true GNOME hacker has to take a shot (multiple shots, mind you) at improving login time."

That kind of stuff could be completely unnecessary if the login time was measured periodically (e.g. once a week or daily). IIRC, Mozilla does it.

If a machine was set-up running current development version and the login times were measured (it's important that all modules like Tomboy were enabled there - users actually use them :) ), it would have a very positive impact on the Gnome's usability :)

It would allow to experience something like: "Hmm, today's login takes 2 seconds more. What was done that made that worse?"

Anyway, "Improving Login Time" is a great series and I hope there will be part 4, 5 and 6 (and there it would stop, because it'll be 3 seconds there :) ).

Also I hope much of this work will go upstream, so other distros than Fedora (unfortunately I use the other one) get this hotness too. Best regards!
Mateusz, what you suggest does help for detecting regressions, and is generally a very good idea. Just if we had the resources, ie. someone to actually set it up...

However, some of the stuff I'm fixing (for example all the work in the upcoming post) is stuff that has been suboptimal for years. They are not regressions, just how things where implemented in the first place.

Oh, and *all* the work I do is in upstream. I first submit to upstream bugzilla and work with maintainers to commit them, only then push them in Fedora.

/me goes finishing his patches

Post a Comment

Links to this post:

Create a Link

<< Archive
<< Home