LIP File Format

LIP files are talking-head lip-sync files. They do not contain audio or image pixels. A LIP file stores a timed sequence of phoneme codes; the dialogue system uses those codes to choose frames from the current talking head's phoneme FRM while playing the matching speech ACM.

In Fallout 2 dialogue, the path starts in a MSG file. The MSG entry's audio field is passed to the lip-sync system, the current talking-head art base name selects a speech subdirectory, the LIP file supplies mouth timing, and the matching ACM file supplies sound.

Location and lookup

Speech assets are stored below SOUND\SPEECH. For a talking head with art-list base name MYRON and a MSG audio field myron01, the dialogue code looks for:

SOUND\SPEECH\MYRON\myron01.LIP
SOUND\SPEECH\MYRON\myron01.ACM
InputSourceUse
Head base nameart\heads\heads.lst, through the current talking-head FID.Speech subdirectory under SOUND\SPEECH and the base for talking-head FRM names.
Audio tokenSecond field of the dialogue MSG entry.Initial LIP filename. Any extension in the token is stripped.
LIP internal audio nameThe fixed 8-byte name field inside the LIP file.Final ACM filename used by the lip-sync loader.

The internal audio name matters. Fallout 2 CE copies the MSG audio token into the LIP state before opening the file, but after the LIP header is read it uses the LIP file's own 8-byte name field when loading the ACM. For normal assets these names match. Tools should warn when they do not.

Binary properties

PropertyDescription
File typeBinary, fixed header followed by variable-length phoneme and marker arrays.
VersionVersion 2 is the Fallout 2 asset format documented here. Fallout 2 CE also contains a reader for an older version 1 layout.
Integer byte order32-bit integers are read through the normal Fallout file helpers, which store high byte first.
StringsFixed-size byte arrays. Names are expected to be short, null-terminated game strings.
No compressionThe LIP file itself is not compressed. The paired speech audio is an ACM file.

Version 2 layout

All numeric fields in this table are 32-bit signed values as read by the engine, except for the phoneme byte array and fixed strings.

OffsetSizeFieldDescription
0x004versionFile version. Fallout 2 LIP assets use 2.
0x044unknown_04Usually 0x00005800. CE reads it as field_4. Version 1 data is normalized to this value after loading.
0x084flagsStored flags. Runtime playback uses low bits for internal play state, so files should normally store 0.
0x0C4unknown_0CRead by the engine, not used for the normal frame selection path.
0x104decoded_audio_lengthTraditional docs describe this as the unpacked ACM length. CE stores it as field_1C and uses it only to compute an average marker spacing after sound load.
0x144phoneme_countNumber of one-byte phoneme codes following the header.
0x184unknown_18Read by the engine, not used for normal playback.
0x1C4marker_countNumber of marker records following the phoneme array.
0x208audio_name[8]Base filename for the matching speech ACM. Keep it to at most 7 visible bytes plus a null terminator for original-tool compatibility.
0x284audio_ext[4]Historical extension field. Older docs list VOC; CE reads it, then overwrites the runtime value with ACM.
0x2Cphoneme_countphonemes[]One byte per phoneme code. Valid codes are 0 through 41.
0x2C + phoneme_countmarker_count * 8markers[]Each marker is two int32 values: marker type and decoded-audio position.

The expected file size for version 2 is 0x2C + phoneme_count + marker_count * 8. There is no checksum or footer.

Marker records

Relative offsetSizeFieldDescription
0x004typeExpected to be 0 or 1. The playback code validates and logs unexpected values, but frame selection uses only the position and phoneme arrays.
0x044positionPosition in the decoded speech stream. Playback advances through markers by comparing this value with the current sound position.

The first marker should have position 0 and a marker type of 0 or 1. Later marker positions should be monotonically non-decreasing. CE logs invalid marker type and decreasing-position problems, but authoring tools should treat them as errors because they can desynchronize or break mouth playback.

Traditional TeamX/Anchorite documentation describes marker positions as decoded sample offsets and gives the authoring rule time_seconds * sample_rate * 4 for 22,100 Hz speech. In practice, a tool should align marker units with the sound API used by the target engine and validate against in-game playback.

Runtime playback

The dialogue UI does not use every MSG lookup as speech. Fallout 2 CE asks for speech when rendering the NPC reply line, but option text is fetched without starting speech. When speech is requested and the MSG audio field is non-empty, the game loads the LIP/ACM pair, starts the sound, then calls the LIP ticker while the dialogue window is active.

During playback, the ticker asks the sound system for the current decoded position, walks forward through marker records, copies the corresponding phoneme code into the current phoneme state, and redraws the talking head when the phoneme changes. When the speech sound stops, the dialogue code ends lip-sync playback and redraws the head at frame 0.

The LIP marker type is not part of that frame-selection path. It is useful authoring metadata, but mouth movement is driven by marker positions and phoneme codes.

Counts and alignment

Classic docs describe marker_count as usually one more than phoneme_count, with an initial zero-time closed-mouth marker. Fallout 2 CE reads both counts independently and does not enforce that relationship. The playback loop indexes the phoneme array by marker progression, so mismatched counts are dangerous even when the loader accepts the file.

Phoneme codes

Fallout 2 CE defines PHONEME_COUNT as 42. Codes 0x00 through 0x29 are valid. During dialogue rendering, each code is mapped to one of the first nine frames in the selected phoneme FRM.

CodeFRM frameCodeFRM frameCodeFRM frame
0x0000x0E10x1C2
0x0130x0F70x1D2
0x0210x1070x1E2
0x0310x1160x1F2
0x0430x1260x206
0x0510x1320x212
0x0610x1420x222
0x0710x1520x235
0x0870x1620x248
0x0980x1740x252
0x0A70x1840x262
0x0B30x1950x272
0x0C10x1A50x282
0x0D80x1B20x298

Invalid phoneme codes are only logged by CE during load. They can later be used as indexes into the phoneme-to-frame table, so a robust validator should reject any code above 0x29.

Talking-head FRMs

LIP playback renders frames from a talking-head phoneme FRM. The current reaction selects which phoneme animation is locked:

ReactionHead animation idFilename suffixMeaning
Good9gpGood phoneme frames.
Neutral10npNeutral phoneme frames.
Bad11bpBad phoneme frames.

For a head base MYRON, the neutral phoneme FRM is built as art\heads\MYRONnp.frm. The phoneme FRM must provide at least the frames referenced by the phoneme table, normally frames 0 through 8.

The full talking-head suffix table used by CE is:

Animation idSuffixPurpose
0gvVery good reaction.
1gf#Good fidget. The number is the selected fidget index.
2gnGood to neutral transition.
3ngNeutral to good transition.
4nf#Neutral fidget.
5nbNeutral to bad transition.
6bnBad to neutral transition.
7bf#Bad fidget.
8bvVery bad reaction.
9gpGood phonemes.
10npNeutral phonemes.
11bpBad phonemes.

heads.lst is also used for fidget counts. After the head base name, the comma-separated values are read as good, neutral, and bad fidget counts. This metadata is not stored in the LIP file, but it matters for the same talking-head presentation system.

Version 1 notes

Fallout 2 CE contains a version 1 reader, but the normal documented Fallout 2 format is version 2. Version 1 has a much larger header with old pointer-like fields, extra extension strings, and a 260-byte path/string field before the phoneme and marker arrays. CE reads those fields, then clears pointer fields and normalizes some runtime values.

OffsetSizeFieldNotes
0x004versionValue 1.
0x044unknown_04Normalized by CE after load.
0x084flagsRuntime flags.
0x0C4sound_pointerOld pointer-like field. CE reads and discards it.
0x104unknown_10Preserved in runtime state.
0x144buffer_pointerOld pointer-like field. CE reads and discards it.
0x184phoneme_pointerOld pointer-like field. CE reads and discards it.
0x1C4decoded_audio_lengthSame practical role as version 2 offset 0x10.
0x204start_offsetPlayback start offset.
0x244phoneme_countNumber of phoneme bytes after the version 1 header.
0x284unknown_28Preserved in runtime state.
0x2C4marker_countNumber of marker records after the phoneme bytes.
0x304marker_pointerOld pointer-like field. CE reads and discards it.
0x340x1CunknownsSeven int32 fields preserved or normalized in runtime state.
0x508audio_name[8]Base filename.
0x584audio_ext[4]Extension string.
0x5C4text_ext[4]Extension string.
0x604lip_ext[4]Extension string.
0x64260path_or_text[260]Old fixed string block.
0x168phoneme_countphonemes[]Same byte array concept as version 2.
0x168 + phoneme_countmarker_count * 8markers[]Same marker record concept as version 2.

Unless you are preserving known old data, new files should be written as version 2.

Reader recipe

  1. Open SOUND\SPEECH\<head>\<audio>.LIP in binary mode.
  2. Read the 32-bit big-endian version.
  3. For version 2, read the fixed fields through audio_ext[4].
  4. Allocate and read phoneme_count bytes.
  5. Allocate and read marker_count marker records, each {int32 type, int32 position}.
  6. Validate phoneme codes, marker types, first marker position, monotonic marker positions, and expected file length.
  7. Load the matching ACM from SOUND\SPEECH\<head>\<audio_name>.ACM.
  8. During playback, advance through markers by decoded sound position and render the mapped frame from the current reaction's phoneme FRM.

Authoring notes

Source references