Techniques for Manipulating Text (and why that s useful)

Techniques for Manipulating Text (and why that’s useful) presented by Evan Schiff What do these files have in common? ALE Avid Bin Export FCP XM...
Author: Dana Miles
5 downloads 2 Views 2MB Size
Techniques for Manipulating Text (and why that’s useful)

presented by Evan Schiff

What do these files have in common?

ALE

Avid Bin Export

FCP XML

SubCap

They are all plain text.

STL Subtitles

You Can Do A Lot With Plain Text

Reformat one type of file into another Add, remove, or fix data generated by other applications Quickly hunt down a specific piece or pattern of information within a large document Make changes en masse to avoid time-consuming manual adjustments Parse it using a Terminal command or script

Some Real World Examples

Convert an EDL with Locators into a SubCap file for importing back into Avid Convert Avid Locators into a DVDSP or Compressor Chapter Markers file Create an EDL out of data in your Filemaker codebook Batch rename files with advanced substitution patterns Process a list of missing ProTools media to automatically hunt it down

What Tools Do You Need? A good text editor. (Not TextEdit or Notepad.)

Textmate ($64)

Atom (Free)

Sublime ($70)

Starting Out A) What type of data do you have? B) What type of data do you need? How do you turn A into B?

Starting Out Get familiar with everything a text editor can do. Learn to navigate using the keyboard. Experiment.

After Learning the Basics

Learn Regular Expressions (RegEx) Learn a programming language such as Python, Javascript, or Bash. Combine scripting with RegEx to accomplish complex tasks Constantly reassess your workflow to find faster and easier methods If you come up with something cool, share it!

Text Manipulation Without Regular Expressions

Multiple Cursors Demo Most of the time when we want to manipulate text, we want to change a lot of it all at once One way to do that is with Multiple Cursors Let’s look at what that is, and how it can be useful

Multiple Cursors Demo Cmd-F: Find Opt-Enter: Find All, Add a Cursor at every Occurrence Cmd-Click Text: Add a Cursor manually Cmd-Shift-L: For every line of selected text, add a Cursor

OS X Keyboard Navigation • Cmd ←/→: Go to start/end of line • Cmd ↑ / ↓: Go to top/bottom of document • Opt ←/→: Go to previous/next word • Add Shift to select or unselect text

Windows Keyboard Navigation • Home/End: Go to start/end of line • Ctrl Home/End: Go to top/bottom of document • Ctrl ←/→: Go to previous/next word • Add Shift to select or unselect text

Text Manipulation Using Regular Expressions

What are Regular Expressions?

Regular Expressions (RegEx) are a way to define patterns of text They enable you to find and select text that matches those patterns And with that matching text selected you can then change it to suit your needs

What Does It Look Like?

(\d{3}[^\n]*([0-9:]{11})\s([0-9:]{11})\s?\n[\s\S]*?(?=^\d{3}|^>|^$))

Wait, what the #\.$*^# is that?!

Hold it, Hold it, What the hell is that shit?! RegEx is made up of codes Each code represents a set of characters such as
 letters, numbers, and $#!* When written in a specific sequence, they define a pattern of text. Let’s take a closer look.

What does RegEx look like? Simple: Timecode:

\d{2}:\d{2}:\d{2}:\d{2}

E-Mail Address:

[\w.-]+@[A-Za-z0-9.-]+\.[A-Z]{2,4}

01:02:03:04 [email protected]

Complex: EDL Event: Reference Locator in an EDL: Reference:

(\d{3}[^\n]*([0-9:]{11})\s([0-9:]{11})\s?\n[\s\S]*?(?=^\d{3}|^>|^$)) 003 08P013V V

C

13:31:20:14 13:31:28:16 04:00:42:22 04:00:51:00

\* LOC.*[\d:]{11}\s+([\w]+) +\b([^\r\n]*?)\r?\n * LOC: 04:00:47:20 BLUE

CS0020

What are some other patterns you can think of? Let’s test them with Rubular.com

RegEx Structure Regular Expressions consist of character codes and quantifiers Or in other words, what is the character you are looking for
 and how many times do you expect to see it? Search Criteria

Code

Example

3 Digits:

\d{3}

315

A word of any length:

\w+

Avid

colou?r

color or colour

Zero or one ‘u’:

RegEx Symbols Symbol .

What It Represents Any character except line break

\w

Any character that could be part of a word

\d

Any digit

\s

Whitespace (spaces, tabs, line breaks)

\t

Tab

\n and \r [ ]

Such As A-Z 0-9 Special Characters

A-Z 0-9 _ 0-9

Line Break: \n is Mac/Linux, \r\n is Windows Match the letters or symbols inside the brackets. The example to the right matches only the letters D, E, or F

[DEF]

RegEx Quantifiers Symbol

What It Represents

?

Zero or One occurrence

+

One or more occurrence

*

Zero or more occurrences

{2} {2,5}

Exactly 2 occurrences Between 2 and 5 occurrences

Some Real World Demos Convert an EDL with Locators into a SubCap file for importing back into Avid Convert Avid Locators into a FCP or Compressor Chapter Markers file Batch rename files with advanced substitution patterns

When are RegEx Not Useful? When you don’t know or can’t clearly define a pattern If it’s faster to make the changes by hand than figure out what the pattern is When there’s no variation in what you’re searching for Sometimes a normal Find & Replace is all you need When there’s too much variation, don’t try the all-in-one approach. Maybe you can break it down into multiple smaller patterns

So What’s the Next Step?

Shell Scripting

What is Shell Scripting? Shell scripting uses a programming language to execute a series of commands in order to accomplish a more complex task

ProTools Example

In 40 lines of code, and using regular expressions, this script takes a text file of media that ProTools can’t find, locates it, and copies it to a directory on your desktop.

How Do I Learn?

Google it! Check out sites like Codecademy, Code Avengers, Khan Academy, etc. Pick a language to learn, and of the many languages out there,
 I would probably start with Python

Thanks!