I'm trying to version control IntelliJ IDEA configuration files. Here's a small sample:
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ChangeListManager">
<ignored path="tilde.iws" />
<ignored path=".idea/workspace.xml" />
<ignored path=".idea/dataSources.local.xml" />
<option name="EXCLUDED_CONVERTED_TO_IGNORED" value="true" />
<option name="TRACKING_ENABLED" value="true" />
<option name="SHOW_DIALOG" value="false" />
<option name="HIGHLIGHT_CONFLICTS" value="true" />
<option name="HIGHLIGHT_NON_ACTIVE_CHANGELIST" value="false" />
<option name="LAST_RESOLUTION" value="IGNORE" />
</component>
<component name="ToolWindowManager">
<frame x="1201" y="380" width="958" height="1179" extended-state="0" />
<editor active="false" />
<layout>
<window_info id="TODO" active="false" anchor="bottom" auto_hide="false" internal_type="DOCKED" type="DOCKED" visible="false" show_stripe_button="true" weight="0.33" sideWeight="0.5" order="6" side_tool="false" content_ui="tabs" />
<window_info id="Palette	" active="false" anchor="left" auto_hide="false" internal_type="DOCKED" type="DOCKED" visible="false" show_stripe_button="true" weight="0.33" sideWeight="0.5" order="2" side_tool="false" content_ui="tabs" />
</layout>
</component>
</project>
Some elements, such as /project/component[@name='ToolWindowManager']/layout/window_info
seem to be saved in arbitrary sequence every time the IDE saves the configuration. All elements of the same type seem to always have the same attributes in the same sequence. Considering that the sequence of elements is irrelevant for the functioning of the IDE, it would be useful if elements are sorted by element name and then attribute values, and attributes and whitespace are left in place.
Based on another answer I've gotten to this:
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="xml" indent="yes" encoding="UTF-8"/>
<strip-space elements="*"/>
<template match="processing-instruction()|@*">
<copy>
<apply-templates select="node()|@*"/>
</copy>
</template>
<template match="*">
<copy>
<apply-templates select="@*"/>
<apply-templates>
<sort select="name()"/>
<sort select="@*[1]"/>
<sort select="@*[2]"/>
<sort select="@*[3]"/>
<sort select="@*[4]"/>
<sort select="@*[5]"/>
<sort select="@*[6]"/>
</apply-templates>
</copy>
</template>
</stylesheet>
It's almost there, but with a few issues:
- It doesn't sort by every attribute value (and
@*
doesn't work) - It removes space before the end of empty elements (
<foo />
becomes<foo/>
). - It adds a newline at EOF (which IMO isn't a bug, but makes the resulting file less similar to the original).
Best Answer
I'd tackle it using
perl
andXML::Twig
.perl has a
sort
function, which allows you to specify arbitary criteria for comparing a sequence of values. As long as your function returns positive, negative or zero based on relative ordering.So this is where the magic happens - we specify a sort criteria that:
It needs to do this recursively across your structure to sort sub-nodes too.
So:
Which given your input, outputs:
No matter what order the various child nodes fall in.
I personally like
indented_a
because it wraps attributes to new lines, and I think that's clearer. Butindented
output format could do the same trick.