Thursday, January 19, 2012

Import FLA?

I have wanted Lightningbeam to be able to import .fla files for a while, as it would make it relatively easy to access old projects. The problem? While the .swf file format is a well known format, the .fla format is virtually undocumented. It is also a binary blob file, which is completely incomprehensible except for some strings of ActionScript. Following is the progress I have made on this.

After some research, I found that there are in fact two completely different types of .fla file: the new CS5 .fla, and the old Flash MX - CS4 .fla. The new format is in fact just a simple zipped folder structure, with attributes in XML. The old one is the undocumented binary blob. Sadly, the vast majority of .fla files in existence are old-type, and not having any new-type files to work with, I decided to start with the old version.
After a good bit of research, I discovered a vital piece of information: Old-style .fla files in fact use the Microsoft Compound Binary File format. That page suggests using 7zip to extract the .fla; however, the Mac/Unix version of 7zip (p7zip) refuses to do so. I suppose it is a special feature compiled into the Windows version.
After some more research, I found a Python module for interacting with MCBF (also apparently called OLE2) files. Installation was painless, so I decided to run some tests on the first .fla I had lying around:

$ python
Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import OleFileIO_PL
>>> assert OleFileIO_PL.isOleFile('exp.fla')
>>> ole = OleFileIO_PL.OleFileIO('exp.fla')
>>> print ole.listdir()

[['Contents'], ['Media 1'], ['Media 2'], ['Media 3'], ['Media 4'], ['Page 1'], ['Symbol 1'], ['Symbol 2'], ['Symbol 3'], ['Symbol 4']]
>>> f = open('Contents.bin', 'w')
>>> f.write(ole.openstream('Contents').read())
>>> f.close()
>>> f = open('Media1.bin', 'w')
>>> f.write(ole.openstream('Media 1').read())
>>> f.close()
>>> f = open('Page1.bin', 'w')
>>> f.write(ole.openstream('Page 1').read())
>>> f.close()
>>> f = open('Symbol1.bin', 'w')
>>> f.write(ole.openstream('Symbol 1').read())
>>> f.close()
I now had four files, acting as representatives for the different types I was likely to encounter. I opened up the 'Contents' file, and found...binary gobbledegook. But it wasn't all gobbledegook; it had some things that were definitely resource links ('Media 1???bottledwater.png???A/Users/greenandsave/Public/Drop_Box/recycling/#1/bottledwater.png??rL??sL???????????????????????2?'), some that were some sort of property list pair (':Quality???4???Vector::FireFox???0???"PublishRNWKProperties::exportAudio???1??? PublishRNWKProperties::speed384K???0???!PublishRNWKProperties::exportSMIL???1???"PublishGifProperties::DitherOption??????-PublishFormatProperties::generatorDefaultName???1???'), and at the end: XML. Not very much though.
<?xml version="1.0" encoding="UTF-16" standalone="no" ?>
<mobileSettings>

  <contentType id="standalonePlayer" name="Standalone Player"/>

  <testDevices/>

</mobileSettings>
The first resource link led me to guess that Media1.bin contained some sort of PNG file. Unfortunately, it wasn't just a png file, as I discovered after changing the extension to .png and trying to open it. Or at least it was either not a pure PNG or it was missing some headers. More research on that to follow.

Abandoning the media for the time being, I opened up Symbol1.bin. It was a good deal smaller, in mostly-pure binary:
?CPicPage?? CPicLayer?? CPicFrame????6b?-Z+?[?@?@4???l???????(??O????<T?
z?0??=??????^?π???\????O4??4?,9?p??8?_f?Z.zF?3u?.??jp?Q;?04W?u??h41?4?????)wC?                                                                              ?e?H??????????Y????????                     ???Layer 1????O?O???
I am guessing that that "Layer 1" in there refers to the name of the layer and not the layer itself, as Flash allows you to rename layers and it would make sense if it stored the names in plain text. This file is small enough that it should be possible to reverse-engineer.
Lastly, I opened up Page1.bin, from which I had no idea what to expect. It starts off with a mixture of binary and XML:
?CPicPage?? CPicLayer?? CPicFrame????????????kD????????
                                                               ???Layer 4?????O??????
CPicBitmap
? M???,???(????a???????????????m?;*?0(?0?m0???????????Ky????????? CPicShape???E@?|>????|????3??????????s??U?>0?0??0?U ???????????????????????
                                                                       ???Layer 1????O?O?????
CPicSpritet -
}???????<component metaDataFetched='true' schemaUrl='' schemaOperation='' sceneRootLabel='Scene 1' oldCopiedComponentPath='1'>
</component>
?? ???O??????m
}???????<component metaDataFetched='true' schemaUrl='' schemaOperation='' sceneRootLabel='Scene 1' oldCopiedComponentPath='7'>
</component>
And then there are about 8 KB of mostly blank spaces and newlines, with one character per line. A sample:
   ?
    K
     ?
      X
       !
        X
         5
          X
           5
            X
             X
              0
               ?
                ?
I have no idea what it indicates.
Finally, it ends with more binary/XML:
}???????<component metaDataFetched='true' schemaUrl='' schemaOperation='' sceneRootLabel='Scene 1' oldCopiedComponentPath='6'>
</component>
????????????????????
                    ???Layer 2?????3???? ????$????????????1???stop();??????~?????????????'?<Y?P??<8H???q8o?Z<??V??<V?T?$?8???q
                                                     ??q???%?P?????????????}???stop();?????
           ???Layer 3??????O???
So, to wrap this up: The .fla file format is quite complicated. But it has a structure, and I believe I can recover many attributes from it. The format of the "Media X" files is something I need to try and find, which I will try by comparing them with the original images.
Is FLA import coming to Lightningbeam? Probably, but not right away.

7 comments:

  1. Hey man, it's me, digitalseraphim again. Thought I would drop in and see how you were doing and stumbled upon this and am *very* interested in helping out. I have 010 editor which is extremely helpful for reverse engineering binary files, and was thinking of looking at 'fla's myself, but seeing how you've already made some progress, I'm wondering if I could help you out. Going to make a simple fla (got a demo of Flash for the next 28 days or so) and see what I can do. I'm on windows, so maybe 7zip will be able to handle it here.

    ReplyDelete
  2. Cool! I think the most important thing to try to extract right now is the shape data of symbols and how the animations work - from there we would be able to do at least a basic import of FLA files.

    ReplyDelete
  3. So far, I have discovered that windows 7zip does handle the fla files. I just have to rename them (I'm using an extension of ole2, but others might work). I'm currently working on the Contents file, which contains (at a minimum) a section with all of the document settings. It looks like it might be the output of some serialization library, potentially a well known one, but just not known to me. Values seem to be "bookended" by (potentially) FF FE FF, though its possible that its started with FE FF and ended with FF, still working this. Trying to find a count of items, have 2 empty files where the number of text keyed values is 170 (0xAA) however there is a blob of data around 430 bytes that I can't decode. will put this file aside for a bit and add a shape to the fla and see what I can figure out.

    ReplyDelete
  4. There is a wiki page that Benjamin put up (http://wiki.benjaminwolsey.de/FLA_Format), which mentions that FF FE FF is the divider.
    Also - the number of 'Symbol X' files is probably a good count. I've been doing a bit more research in the meantime as well with basic files, which I will post soon.

    ReplyDelete
  5. fla files are proprietary, I think. You would likely be risking a lawsuit if you include fla support in your project.

    ReplyDelete
    Replies
    1. It is true that .fla is a proprietary format; however, I am not decompiling Flash or anything, but rather reverse-engineering the format from the end product only. This should not be any more of a problem than OpenOffice supporting Word docs or Gimp supporting PSDs.

      Delete