Loading...

The Fast Meme Transform: Convert Audio Into Linux Commands

2018-02-10 – By Robert Elder

     In recent years the meme economy has grown to have a significant influence on our everyday lives and culture.  This influence has non-negligible effects on the measured GDP of the ‘real’ non-meme economy.  Recently, the influence of memes has extended to even affect our political system.  Although some might consider memes to be merely a joke not worthy of serious consideration, the significance of memes cannot be under-stated and it is therefore meaningful to explore new ways in which memes can be proliferated.

     This article will present a method of applying the Cooley-Tukey Fast Fourier Transform algorithm to convert audio samples from YouTube videos that contain dank memes into small self-contained Linux commands.  These self-contained Linux commands describe the coefficients of the discrete Fourier series and are packaged along with a small audio decoder (written in awk) that performs the inverse Fourier transform (along with a few small adjustments) to decode the audio.  Expressing several seconds of audio data in terms of standard printable ASCII characters that are simultaneously easily copied, but are also easy for a listener to understand, presents various challenges.  Several techniques to improve sound quality and compress the Fourier co-efficients are also discussed to meet these challenges, with an overall satisfactory final outcome.  This article is also part of a collection:

     A meme codec audio too called the Fast Meme Transform has been created which can be used to easily transform audio samples from YouTube videos into Linux commands.  The Fast Meme Transform tool itself is a small ISO C99 program with no external dependencies other than the C standard library, so it should be versatile in situations that require rapid meme deployment.  For example, you can make use of the ‘youtube-dl’ tool to quickly download the YouTube video and extract the audio into a standard PCM wav format just like in the example below:

youtube-dl M6PbdJiAK84 -x --audio-format wav -o "tripple.%(ext)s"

     Then, pass the wav data through the Fast Meme Transform encoder tool:

./fmt -f tripple.wav -s 4.9 -e 7.9

     Here is the output you’ll get after running the above commands:

echo -n 'uuuuJC8/CvOoO2/OgO5EGvHLHU:i:a:]:_:8I:c<3/D1>D2(DqD4/D5YD6FD7>DaDc?l?{?W?5E9z9fDiDa?k?1=r?2=r?4Y?h?m?[n9z9Q?|?3I?4Br?5I?6>?7}190E9f?P?f?p?4/?5Br?6B5#?7/n8h8a8ZD2/DWDgD5>DtD8/Dc?|?o?2/?Ln7h7L7_7a8l8P8WCZ:o:d:p:g:s:b:]<0}<1>72(7Q75/7s7UCkCRCwCtC7E5_6f6p6R6L6Un51F5p54E4h4L4b49EJ77/78I8V8z83X8R39z39W39y397Fn80F8f8i8cKQKmK[j0Xn8h86I8U8b8ZKgKyT{5O%NUN]61V869^y8t8_8[8cKQK_TzT3/T4/N[Nc61l6G^6M7M]9VKmTPTpTgNc61|6G(6H^w8tSRSsSaS]5:*5<*59bN|NoNqNRNyN6^y8m<]DV33vSP4OYS3YS4/S9^R85EuuJ7R7w7s77Y79>8k8W8Qn71X74>75=r76>77/78MP:k:{:3/:5E7U79}80=r8{82MW8R8L:Q:5/:L:Z9{D3XDRDL26_26v27l272(27QnDiDa?l35R35w35t35[3N%j6Fn?z?f35[35c3N/36P363Xnu8V82Mg8s:6I:7Y:[:3>:g:w:tn7fH7/ClCPC4B5#C5BrCLCiCaC]:knGWGwG7IG9>HkHoH2IHgH5}H6>Hin9L9b10q10R11V11{1G(1H%1D*1?*11cG|GPGLGUHlHUH[CoCqC5/CLC_C9F:l:q:h:L:Z' | hexdump -v -e '/1 "%un"' | awk 'function d(f,k,a,b,o,w,i,q,h,j){for(k=32;k<127;k++)o[sprintf("%c", k)]=k;split("e-0,,, 2x,,=1#,=-1,,=2#,=-2,&#,+,,-,.,)#,0,1,2,3,4,5,6,7,8,9,15,;,16,=,=3#,18,0=0,9 3,=-,14,17,%n,BAx,12,13,B3#,@n,43,6*,/8,60,22,1*,4*,4%,42,52,7*,0%,3*,=Ax,B4#,9%,8*,,9(,E8,7%,,8(,8%,9/,2*,e,2%,4(,5*,7(,44,0(,0*,6%,,1/,3(,3%,6#,6/,6(,JJ,9*,5(,x,5%,1%,1(,0/,=4#,,",a,",");split("0,1,2,3,4,5,6,7,8,9,=,e,x,+,-,.,n,;",w,",");q=0;for(g in w)if(o[w[g]]==f)q++;i=f-32+1;if(q>0)printf(f==110?"n":a[i]);else{h=split(a[i],b,"");for(j=0;j' | awk 'BEGIN{L=2400;N=8192;for(n=0;n0?(R>32767?32767:R):(R<-32768?32768:65536+R)));}else if(n<2*L){o[n-L+1]=R;}}}' | awk '{a[i++]=$0}END{while(j' | xxd -r -p | aplay -q -c 1 -f S16_BE -r 48000

     Listen to the audio output of the above command: tripple.wav

     The text you see above is a self-contained Linux command that includes all of the audio data from the meme as well as the necessary software to decode it.  The final output of the command is raw PCM that needs only to be directed to the system audio driver.  As you can see, the above Linux command is rather large, but it is still small enough to easily share through copy and pasting.

     In situations where the existence of the Illuminati has been confirmed, it is often appropriate to make others aware of this fact by means of brief musical tone:

youtube-dl GRWbIoIR04c -x --audio-format wav -o "confirmed.%(ext)s"
./fmt -f confirmed.wav -e 10 -m 5

     you’ll get the following output:

echo -n 'UUUUF15M7Y?SrN3a4K5]6ka8K9%|@5V%bM5*16t_*SWZ23aiaR|>31*43C3(6?7Z8T9=7#B=4#13H6Y*GT27]9zC4%5[Z{Y*GN1qZ23K7]8*7sy*7[?39Is8*B?<[email protected]@F8uM7M8(Gg[*[email protected]^bKiC8IN9C9[kN8l>28KR76W7C17H8t*SJ27%|[email protected]?SN8]9O4m*77W1PsXt*SJ38V5pfR90q4P5*njzO7f9P7P8*nXt*74V5u0}1P5Cj([email protected]>w*<4P5CEI4gY>47(55I3C36>37>76?wMG?G3>A6obV4O7MS(G0?G1C<>75pQ9>E6(E7(G2_3g[O7fR91}2v>G1_2lQV5u0u1JE^bV4(89_1lI_0_1g[Q2I3I5Y9_2C8Q3IsbI3JA3WsBI3P7P8*[email protected]@1GK65K67N68]6^263K65N66][email protected]@[email protected]?XZE?S]8a9M70W6P8CBQH4>17H8TETSK7NR|W2W3q5r6cXZEZSN8]9P2P4q5K70CXtT2i(|P2JA5JA7r8*njZGKRA4>A5>A7CXYTG]7]8%[email protected]?Ga8N9P4>[email protected]*GN8(47%[email protected](GCE_kN8%47W6CEK9%[email protected]^[email protected]^[email protected]}d4O4V5C3(4?5(8?B*<*{H5tmV6C8}*BH2M3V4pV7C5f[?8R90?n5u?Bd2H4M5(h%57pf^7(8(BQ*12H6Jh(57V6g*B*13H4%h>wl[I(hI2Q3*<4g?BM4H5%55c4(8IH3M5Q2?wC9*BpI3MGC5f[IM4zV7J<9>1SCbJ75lImpf9?91c9[H4N3K4o<(8^[email protected]><[email protected]<2>1GMSC<1Q^EK3I9>[email protected]@[email protected]@bM4([email protected]>n' | hexdump -v -e '/1 "%un"' | awk 'function d(f,k,a,b,o,w,i,q,h,j){for(k=32;k<127;k++)o[sprintf("%c", k)]=k;split("e-0,,, 2x,,=1#,=-1,,&#,=-2,=2#,+,,-,.,9 3,0,1,2,3,4,5,6,7,8,9,0=0,;,11,=,=/x,)#,%n,22,10,(n,=-/,19,:n,20,%1,(<,Dx,(2,=-3,(1,%2,%7,%A,%<,9(,21,=3#,FF,(7,(A,18,%E,L#,%B,,*2,[email protected],%G,,?2,RB,>n,?1,e,%8,@9,56,7K8,X?E,N7,@8,O5,,Jn,O6,?A,*A,4C,*E,(9,HG,<3,x,5%6,?76,14,30,%9,,",a,",");split("0,1,2,3,4,5,6,7,8,9,=,e,x,+,-,.,n,;",w,",");q=0;for(g in w)if(o[w[g]]==f)q++;i=f-32+1;if(q>0)printf(f==110?"n":a[i]);else{h=split(a[i],b,"");for(j=0;j' | awk 'BEGIN{L=2400;N=8192;for(n=0;n0?(R>32767?32767:R):(R<-32768?32768:65536+R)));}else if(n<2*L){o[n-L+1]=R;}}}' | awk '{a[i++]=$0}END{while(j' | xxd -r -p | aplay -q -c 1 -f S16_BE -r 48000

     Listen to the audio output of the above command: confirmed.wav

     The above command will play the first 10 seconds of the X-files theme music.  The -m parameter was added in this case to increase the volume by 5 since it’s a bit quiet otherwise.

     For situations that require a Linux command that can highlight moments of unexpected disappointment:

youtube-dl Ag1o3koTLWM -x --audio-format wav -o "curb.%(ext)s"
./fmt -f curb.wav -e 10

     you’ll get the following output:

echo -n '[email protected]<6}[email protected]([email protected]<6}y3H4]P|l5O7*<5^[email protected][3J6jzbI7>8F<7>8H7V8:40p6:57uP75(77J6M8K9RCa6a9H8[9u6*57*75(76:77T3T4I7B8M9=4#Cy8[9u5u6:57(94I9RC[8IPC*nC:DI8:<8F1PnDV7F4W59O1l4z^k5a6V1f3]4l0O2N63N64%8W83F<7oPD:2L41fw43O0O1:6L63l4*8l7F6tSZ1o12T4M18J7%dT2T3:<7Z3T4BiF<6Z2X0BiBmF<5^Z3X0X3zF17B8(Yo|BmJ6_(SMmB54Q35F36]1(d:S(Y%9WnY:92K12K13K15%[email protected]%Y>[email protected](S:nqWUa4[[email protected]_1:nDf3f4R45O4O6*67M13b9_0I1F43f4O6QA<1b9J0J1>WYF92J0B45%h6[7:54v6*87%d:9L93T2T6Q55u6ut88FS:YK11T2K14Z5X1J12T3R114T5Z6]56I11Z2Z3T4K15I12Z3Z5oiX2JiB5LndFY:9L456{S*YX1f54I4XbR18>PD%d(SB|X3zB8H3V5]8f9O5O7lt84%85Z3K14KmJ7H3H4*68T2Z4T5(cGdRSR143>44{8td*Y(9wh5y6]0v8:d(S*YM|Bm{22*4L45l3l6(ST3T4QDaw44]5l5l6%Y(9Ln6tcnS*YM59^[email protected](YI4g([email protected]^9o70_1g2}4I2B3a2H1O1O3*64^8K69j1gWnC(UHrO1Ow63I1%[email protected][1]2F169g1_LnDaL41_0_1jLnC]3o69QD%85:86B08J1v4v5K08Q35V7F5w54p5p7v4j0_1jLi0Q36Fm*54*[email protected]>08>26b9_0Qmu4R55:<68F169J69_0:G85*86X9K60:48Wn84%85J5_H0Hr[5F47fP84J6}k7k8H1H2*[email protected]^j[0V1:[email protected]<4K5j>[email protected]^j[2H3]7J4_H1v4X1zM6X0F1iX3zbVW5Lm:55X0I6H2H3H4H5F37V8:4Lmp4*55*[email protected]%D4{83o61^[email protected]:2WUHL38H9F41O0B05_0g1*nC:DR2L36H7H8^9}0J9*sa2RU[0O0F6w63I0Z%syrQq2y7{CQC4:C5=-4#C6:[email protected]%C5*C6%C8(D0(D1:DLD3(D4:D6*D8RD9:248*24P2ia5L2ma54QD5(D6:2C:2Ua25N226N228V48[49V|Hi%h48R349R3iFh46H48[49=-4#3|Rch47H48H|Q347H48y4Ph47Fh3QC:D:Uk9V1y9f1]2*[email protected][1R33H4HP40]L64I0X_>tC%[email protected][3I1XkWUJ6(CV3J5^%DI0I1B4%DJ0T:cccc<4M5^_(CRD=-4#26*[email protected]:CRDaW60*61%7t7P81Q9o5B8B9RC:Dk2*<9(q2J8(C%qW78(79B5W455{s*PC>00Fh4V5[6VP40*41F44p8X3Qi:5w58O0*61M81Fh9O0:61:ccnA<4bN17*C%D%UR24:44(izb(C(D(Uk4ktS([email protected]<4N15*C%U:24V0I0o3K4R16}FD(UH1J3M5K6:[email protected]%CJ3XK6:[email protected][8[[email protected]:5L5P60{|[email protected]>30J27>29Qs*40*41p7p8*59*[email protected]' | hexdump -v -e '/1 "%un"' | awk 'function d(f,k,a,b,o,w,i,q,h,j){for(k=32;k<127;k++)o[sprintf("%c", k)]=k;split("e-0,,, 2x,,=1#,=-1,,&#,=-2,=2#,+,,-,.,9 3,0,1,2,3,4,5,6,7,8,9,)#,;,n1,=,%1,/x,=-?,0=0,(1,20,21,=-3,=?,nAn,%3,(<,%<,:1,2(,*1,=3#,(6,9%,(n,E#,90,B1,23,(3,2%,>5,91,>1,:3,,%4,>6,B7,,(2,B6,GA,89,e,(4,>7,n3,51,K7,%2,%6,53,,@1,%5,D*2,L33,C*D,8%,(5,(8,2:,x,*3,I5,%n,50,M7,,",a,",");split("0,1,2,3,4,5,6,7,8,9,=,e,x,+,-,.,n,;",w,",");q=0;for(g in w)if(o[w[g]]==f)q++;i=f-32+1;if(q>0)printf(f==110?"n":a[i]);else{h=split(a[i],b,"");for(j=0;j' | awk 'BEGIN{L=2400;N=8192;for(n=0;n=0?0:(R>0?(R>32767?32767:R):(R<-32768?32768:65536+R)));}else if(n<2*L){o[n-L+1]=R;}}}' | awk '{a[i++]=$0}END{while(j

     Listen to the audio output of the above command: curb.wav

     Depending on your needs, you may need to choose between high audio quality and large command size versus low audio quality, but small command size.  This can be controlled by the '-t' or '--threshold' option.  With this parameter, any frequency with a power level contribution lower than the threshold option will be dropped from the output.  For example, let's create a command that only keeps frequencies with a power level over 9000:

youtube-dl SiMHTK15Pik -x --audio-format wav -o "9000.%(ext)s"
./fmt -f 9000.wav -s 9.3 -e 12.5 -t 9000

     in this case the output is:

echo -n '(((& n' | hexdump -v -e '/1 "%un"' | awk 'function d(f,k,a,b,o,w,i,q,h,j){for(k=32;k<127;k++)o[sprintf("%c", k)]=k;split("0=0,,, n,,###,%%%,,&&,,,+,,-,.,,0,1,2,3,4,5,6,7,8,9,,;,,=,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,e,,,,,,,,,,,,,,,,,,,x,,,,,,,",a,",");split("0,1,2,3,4,5,6,7,8,9,=,e,x,+,-,.,n,;",w,",");q=0;for(g in w)if(o[w[g]]==f)q++;i=f-32+1;if(q>0)printf(f==110?"n":a[i]);else{h=split(a[i],b,"");for(j=0;j' | awk 'BEGIN{L=2400;N=8192;for(n=0;n0?(R>32767?32767:R):(R<-32768?32768:65536+R)));}else if(n<2*L){o[n-L+1]=R;}}}' | awk '{a[i++]=$0}END{while(j' | xxd -r -p | aplay -q -c 1 -f S16_BE -r 48000

     Listen to the audio output of the above command: 9000.wav

     What 9000?  There's no way that can be right, because the command above doesn't actually contain much audio data.  The frequency power level threshold of 9000 is simply too high.  Let's try again with a power level threshold of 70:

./fmt -f 9000.wav -s 9.3 -e 12.5 -t 70

     and the result looks something like this:

echo -n '_____N8*Q1)Q3*Q9*M1)J0*4<)J3&JrJ7)J8&K0(K3&KsK8)4E*5G)5[({5)t3*5c)5j)u0ZvI|506Pi)vI5I6I7&S*500P_O*A)j/1d)V9>O?28ZADE:[email protected]@61Dh(90(1WP29DA(j:g:n26B9&E:i/a(wP29&A)[email protected](Q*S(69/70)102:103(10|173B05?06U08}09>[email protected]*G1?17Bl>V>20T21T24FKBoBg>66?89>9rE3Pm/NDQ:nAI4)m)wRNI9=7#M=9#nQ/[email protected]&G8*G9*XBiFmTN?63>65?66F68U69:O1^O2RO3/O5YO6Dno(<9>03B]Tc>t>53B72F78>79Po(d(S&])[5/[6/1N?<>[>14BVT23B26FO?28?EFb?wBNBM?{BgF60?69Z22*S:]D[7:1N*1Q&1M^1J&l3)l5:l6U28?A>E>b?iY2vUWBh>73B75?77/O8fS)])[6&[7/1N>A?E?bF66F74?75:O6?78^O9R280T96)b1I22I23I24I26*b8/3A(3N(3MIJ)3SIc*4A&]7D]8:na3*apasN4)N5I91*Q4I9|M2(M3DM5)J0)K0k08k<&6[&617)6VkwPS)6V(h0Dh2/hph5:h8/h9^6A&6v:6aY6ND6Q=9#6MD6Kko(6d(64r6Sk53Zc(WY{fJ&d*S&]@[email protected]/50f28f{)t&54D5su/57/g(G0*1<)G4?69>71:O3^[email protected]>79?80?82)3E)4M(c7/c8&W0*W2)W8*501P[2BOBABi>vBaFm?NBo?d?W?{>tB61?80>83>84&t6Z<8(1A>21Fb}i}aY2m?M>K?dBS=7#2]=7#2g(nG6&G8YG9:y*1A/1E>22F25BiFvFm>wTN>M>J>d>45>S>]I85)N6I96fq(X:<3/z/l1&l3:lpl8*V0&V1&V3*V4:VsQ3)Q4I95/Q6/Q9:M0*MpK6Zu(j&g&M7*J0&J3&J5:[email protected](K2*K3DK5^K7(4A&4Q(4Kf4/|8(9(53:54*u&57/j&6iP6)7/8f57&g(61&63F80B82F84>86Zu:57:109)G0/G1&GrG7&qFhF68B70?71F75B78RO9R280>82B83T84>[email protected]&A2*E5:E6:b6*3MIu)a7I8pQsK3P5/8*G9([email protected]@<6)<9)1A>45BW>50T92T93T94>98>99&A9&E1(b2:b3ZX)z^<5/<6(<7Y2WD2{}tRA3I<&E4*Ernq(X=7#yYz/<6(1E&[4FJ?KUo?c?53T54}uUj>64F91*A2(A3*E3)E4PG8*q&X/y*<3Yz:cFWUtF53?55>h>66)A2(A|EpE6fG9*qYXD<3Rz>o}d}SF54?55UuFj>92)A3&AsA6(E4*E8Py/<3%9#z^<6?SU]Y2g?60U61:A9/E0/b3PX:<3^zR[4>45TSTj?gU69?71>72&A8:ErE9:n21)G8([email protected]^2J>89B99)A0/A2*E4f15)q=8#X=8#<3>J?K?98)A1&A2/A4:E6Z20(G7&q/X%8#y/96>97)A6fV>0&57(G3&G4>71F73B87?88(nW({/tP20)N&M:K/o^d/]:c*50Z679k83k8|687k89)691*n' | hexdump -v -e '/1 "%un"' | awk 'function d(f,k,a,b,o,w,i,q,h,j){for(k=32;k<127;k++)o[sprintf("%c", k)]=k;split("03x,,,e- ,,=-,%3#,,=3#,%2#,=2#,+,,-,.,%4#,0,1,2,3,4,5,6,7,8,9,=4#,;,12,=,&2,(2,%5#,30,)2,0=0,=5#,31,*2,11,6#,&3,41,42,Cn,40,38,27,)n,39,%7#,46,/2,:2,19,49,<1,=H,*n,13,,47,%H,LLL,,35,32,48,44,e,&n,59,62,33,58,*6,18,36,,43,4(,<0,5(,5*,52,56,34,37,x,<2,<4,51,5&,@2,,",a,",");split("0,1,2,3,4,5,6,7,8,9,=,e,x,+,-,.,n,;",w,",");q=0;for(g in w)if(o[w[g]]==f)q++;i=f-32+1;if(q>0)printf(f==110?"n":a[i]);else{h=split(a[i],b,"");for(j=0;j' | awk 'BEGIN{L=2400;N=8192;for(n=0;n0?(R>32767?32767:R):(R<-32768?32768:65536+R)));}else if(n<2*L){o[n-L+1]=R;}}}' | awk '{a[i++]=$0}END{while(j' | xxd -r -p | aplay -q -c 1 -f S16_BE -r 48000

     Listen to the audio output of the above command: 9000-70.wav

     Let's do some more experimenting and shift the pitch up.  We'll also pipe it directly into shell so we don't need to copy paste:

./fmt -f 9000.wav -s 9.3 -e 12.5 -t 70 -c 1.3 | sh

     Listen to the audio output of the above command: 9000-high-pitch.wav

     Another option is to change the amount of time in the sampling window.  If you change the sample interval with the '-p' option you will probably need to adjust the threshold value to get the same size output because the sample window size will affect the relative power levels:

./fmt -f 9000.wav -s 9.3 -e 12.5 -t 200 -p 0.02 | sh

     Listen to the audio output of the above command: 9000-short-interval.wav

     If you make the window small enough, you'll get to experience all kinds of artifacts and distortion!

     HIGH VOLUME WARNING!!!  For memes that require volume increases that are high enough to cause distortion, you can use the -m option to increase the volume by a factor of 1000:

./fmt -f 9000.wav -s 9.3 -e 12.5 -t 100 -p 0.02 -m 1000 | sh

     Listen to the audio output of the above command: 9000-high-volume.wav

     HIGH VOLUME WARNING!!!  In order to achieve even higher levels of distortion, you can use the high or low pass filters (high pass in this case):

./fmt -f 9000.wav -s 9.3 -e 12.5 -t 100 -p 0.02 -m 1000 -l h 10 500 | sh

     Listen to the audio output of the above command: 9000-high-distortion.wav

     If the thought of running a huge untrusted shell command full of obviously compressed text scares you (like it should), you may rejoice in knowing that the fast meme transform provides you with an option to leave the data uncompressed.  To see what the Fourier coefficients look like, you can disable compression with the '-u' flag:

./fmt -f 9000.wav -s 9.3 -e 12.5 -t 70 -u

     here is the output of the above command:

echo -en "0=0n0=0n0=0n0=0n0=0n0=0n0=0n0=0n0=0n0=0n0=0n0=0n0=0n0=0n0=0n388=2e-03x391=-2e-03x393=2e-03x399=2e-03x401=-2e-03x410=2e-03x412=-2e-03x413=-3e-03x415=3e-03x417=-2e-03x418=-3e-03x420=3e-03x423=-3e-03x425=2e-03x428=-2e-03x431=2e-03x511=-2e-03x513=3e-03x515=-2e-03x523=2e-03x548=-2e-03x558=-2e-03x560=2e-03xn34=-3e-03x35=-3e-03x506=-2e-03xn33=-2e-03x34=-3e-03x35=-3e-03x36=-3e-03x37=-3e-03x46=2e-03x500=-2e-03xn0=0n0=0n0=0n27=2e-03x30=-2e-03x58=-4e-03x144=-2e-03x199=-3e-03x227=3e-03x228=2e-03xn30=5e-03x31=4e-03x57=-3e-03x58=-5e-03x59=-5e-03x61=5e-03x62=3e-03x90=3e-03x149=-2e-03xn29=5e-03x30=3e-03x58=4e-03x59=4e-03xn26=-2e-03x29=-3e-03x31=4e-03x33=-4e-03x35=3e-03x37=-2e-03xn29=-3e-03x30=-2e-03x33=-5e-03x34=-3e-03x35=3e-03x36=5e-03x37=3e-03x39=2e-03x46=3e-03x69=-4e-03x70=-2e-03x102=4e-03x103=3e-03x105=-3e-03x173=-2e-03x205=3e-03x206=4e-03x208=-5e-03x209=-3e-03x217=3e-03x219=-2e-03xn35=-3e-03x36=-5e-03x38=2e-03x111=3e-03x217=-2e-03x218=-3e-03x219=-3e-03x220=-4e-03x221=-4e-03x224=2e-03x242=-2e-03x243=-2e-03x259=-3e-03x266=3e-03x289=-3e-03x295=3e-03x313=-2e-03xn36=-4e-03x38=5e-03x39=4e-03xn30=-3e-03x34=-2e-03x36=-2e-03x37=-7e-03x38=-3e-03x39=7e-03x40=9e-03xn39=-4e-03x40=-5e-03x41=-3e-03x118=2e-03x119=2e-03x121=-2e-03x233=2e-03x236=-4e-03x238=3e-03x263=-3e-03x265=3e-03x266=2e-03x268=4e-03x269=4e-03x271=-6e-03x272=-7e-03x273=-4e-03x275=6e-03x276=5e-03xn43=3e-03x129=-3e-03x203=-2e-03x212=2e-03x217=-2e-03x219=4e-03x229=2e-03x230=4e-03x235=-2e-03x236=-3e-03x247=-4e-03x248=-3e-03x252=-3e-03x253=-2e-03x272=2e-03x278=-3e-03x279=-2e-03xn43=3e-03x44=3e-03x46=-3e-03x47=-2e-03x135=-4e-03x136=-4e-03x138=3e-03x212=-3e-03x213=-3e-03x214=-2e-03x219=-4e-03x223=-2e-03x226=2e-03x227=3e-03x228=3e-03x231=2e-03x232=3e-03x237=-2e-03x238=-2e-03x240=3e-03x251=-2e-03x259=2e-03x260=3e-03x269=2e-03xn22=2e-03x46=4e-03x47=5e-03x137=4e-03x138=2e-03x139=-3e-03x140=-6e-03x141=-3e-03x183=-2e-03x185=4e-03x186=4e-03x228=3e-03x230=-3e-03x231=-3e-03x232=3e-03x233=6e-03x234=4e-03x249=-2e-03x262=-3e-03x273=-2e-03x275=3e-03x277=-4e-03x278=-3e-03xn46=-2e-03x47=-2e-03x136=-3e-03x137=-4e-03x138=-3e-03x230=3e-03x231=3e-03x232=2e-03x266=2e-03x274=3e-03x275=4e-03x276=3e-03x278=-6e-03x279=-7e-03x280=-4e-03x296=-2e-03x321=-3e-03x322=-3e-03x323=-3e-03x324=-3e-03x326=2e-03x328=-4e-03x330=3e-03x338=3e-03x340=-3e-03x341=-2e-03x346=-3e-03x348=2e-03x430=-3e-03x477=5e-03x478=4e-03xn353=2e-03x354=3e-03x355=2e-03x384=-2e-03x385=-3e-03x391=2e-03x394=-3e-03x395=-3e-03x402=3e-03x403=5e-03x405=-2e-03x410=-2e-03x420=2e-03x608=2e-03x612=-3e-03x613=-3e-03x617=-2e-03x619=2e-03x637=-2e-03xn46=-2e-03x619=3e-03x620=5e-03x622=-4e-03x624=3e-03x625=4e-03x628=-4e-03x629=-6e-03x630=-3e-03x634=4e-03x635=6e-03x638=5e-03x639=9e-03x640=5e-03x642=2e-03x643=3e-03x644=3e-03x645=3e-03x646=2e-03x653=2e-03xn48=3e-03x49=6e-03x51=-3e-03xn41=-3e-03x44=2e-03x46=-3e-03x47=-5e-03x48=-5e-03x49=-4e-03x50=-3e-03xn28=-3e-03xn51=-2e-03x52=-3e-03x54=5e-03x55=2e-03x56=-4e-03x57=-4e-03x59=3e-03x110=2e-03x112=-2e-03x114=3e-03x269=-3e-03x271=4e-03x273=-6e-03x275=5e-03x277=-5e-03x278=-3e-03x279=3e-03x280=3e-03x282=-2e-03x331=-2e-03x440=3e-03x487=-4e-03x488=-3e-03x490=2e-03x492=-2e-03x498=2e-03x501=-2e-03xn132=-2e-03x227=-2e-03x230=-2e-03x233=-3e-03x234=-2e-03x235=2e-03x236=3e-03x238=-2e-03x243=3e-03x244=3e-03x249=3e-03x251=-3e-03x252=-2e-03x261=3e-03x280=-3e-03x283=-3e-03x284=-3e-03x526=2e-03xn128=3e-03x130=-3e-03x221=2e-03x232=-5e-03x233=-5e-03x235=6e-03x236=3e-03x240=-3e-03x242=3e-03x244=-2e-03x246=7e-03x247=7e-03x259=3e-03xn116=-3e-03x118=6e-03x119=4e-03x122=2e-03x130=-4e-03x131=-3e-03x222=2e-03x225=-2e-03x233=2e-03x234=2e-03x236=-3e-03x237=-4e-03x238=-3e-03x240=-3e-03x241=-3e-03x244=-3e-03x245=-3e-03x246=-3e-03x247=-3e-03x385=-2e-03x386=-3e-03x396=-3e-03xn120=3e-03x121=4e-03x123=-4e-03x124=-4e-03x181=-3e-03x183=4e-03x184=3e-03x188=2e-03x190=-3e-03x191=-3e-03x193=2e-03x194=4e-03x195=2e-03x393=-2e-03x394=-3e-03x395=-4e-03x396=-4e-03x399=4e-03x400=2e-03x404=3e-03x426=2e-03xn56=3e-03x58=-3e-03x59=-3e-03x407=2e-03x410=-3e-03x413=-3e-03x415=4e-03x417=-5e-03x419=3e-03x422=2e-03x423=5e-03x425=-6e-03x427=3e-03x430=-3e-03x439=3e-03x442=-3e-03xn4=-4e-03x5=-3e-03x8=3e-03x9=3e-03x53=4e-03x54=2e-03x56=-3e-03x57=-4e-03x58=-3e-03x633=-2e-03xn6=-2e-03x7=-4e-03x8=-3e-03xn57=-3e-03x59=3e-03x61=-3e-03x63=2e-03x280=-2e-03x282=2e-03x284=-3e-03x286=2e-03xn56=4e-03x57=4e-03x109=-2e-03x110=-4e-03x111=-3e-03x115=3e-03x117=-3e-03x120=2e-03x262=2e-03x268=-2e-03x270=3e-03x271=2e-03x275=-2e-03x278=-7e-03x279=-7e-03x280=-3e-03x282=-2e-03x283=-4e-03x284=-3e-03x287=3e-03x288=4e-03x289=3e-03x298=-2e-03x299=-5e-03x300=-3e-03x302=2e-03x315=4e-03x316=4e-03x326=2e-03x340=-3e-03x356=-2e-03x357=-3e-03x384=3e-03x395=2e-03x423=-2e-03xn5=-4e-03x8=2e-03x119=3e-03x121=-5e-03x122=-5e-03x126=-2e-03x129=-2e-03x130=-3e-03x245=-2e-03x249=-3e-03x250=-4e-03x292=-4e-03x293=-4e-03x294=-3e-03x298=-3e-03x299=-3e-03x309=-3e-03x311=3e-03x322=4e-03x323=2e-03xn121=-2e-03x124=-6e-03x125=-4e-03x126=3e-03x127=6e-03x249=5e-03x251=-5e-03x252=-7e-03x303=-3e-03x312=-3e-03x314=2e-03x315=3e-03xn120=3e-03x121=7e-03x122=6e-03x124=-4e-03x126=3e-03x131=-3e-03x134=2e-03x241=3e-03x242=4e-03x243=3e-03x248=3e-03x253=-4e-03x254=-5e-03x256=4e-03x258=-3e-03x264=2e-03x291=2e-03x302=3e-03x303=2e-03x313=-2e-03x314=-2e-03xn118=2e-03x120=-3e-03x121=-4e-03x122=2e-03x123=6e-03x124=4e-03x125=2e-03x126=3e-03x241=3e-03x242=3e-03x244=-3e-03x248=2e-03x249=4e-03x252=2e-03x253=3e-03x255=-3e-03x262=-3e-03x266=-2e-03x302=3e-03x305=-3e-03x314=3e-03x316=-3e-03xn119=2e-03x120=6e-03x121=5e-03x123=-7e-03x124=-3e-03x243=-5e-03x244=-5e-03x246=2e-03x254=3e-03x255=4e-03x256=2e-03x258=-3e-03x292=-2e-03x303=-3e-03x305=2e-03x306=3e-03x314=2e-03x318=-2e-03xn122=-4e-03x123=-9e-03x124=-6e-03x126=3e-03x246=4e-03x247=6e-03x259=3e-03x260=4e-03x261=4e-03x309=-4e-03x310=-4e-03x323=-2e-03xn121=4e-03x123=-6e-03x124=-7e-03x134=-3e-03x245=-4e-03x246=-4e-03x258=3e-03x259=4e-03x269=3e-03x271=-3e-03x272=-3e-03x308=4e-03x315=3e-03x319=4e-03xn21=-2e-03x118=3e-03x119=6e-03x121=-5e-03x122=-6e-03x241=-3e-03x289=-2e-03x299=-2e-03x300=-4e-03x302=2e-03x314=-3e-03xn15=-2e-03x120=8e-03x121=8e-03x123=-3e-03x241=3e-03x242=3e-03x298=-2e-03x301=-3e-03x302=-4e-03x304=4e-03x316=2e-03xn20=3e-03x117=-3e-03x120=-4e-03x121=-8e-03x122=-4e-03x124=3e-03x126=3e-03x295=3e-03x304=3e-03x305=2e-03x310=-2e-03x315=-3e-03x316=-2e-03xn113=2e-03x116=3e-03x117=8e-03x118=5e-03x120=-4e-03x121=-3e-03x129=-2e-03x237=-2e-03x291=-3e-03x296=-3e-03x297=-2e-03x306=-3e-03xn19=-3e-03x20=-3e-03x57=3e-03x113=-3e-03x114=-3e-03x271=2e-03x273=-2e-03x287=3e-03x288=3e-03xn49=3e-03x51=-4e-03x52=-2e-03xn20=-2e-03x38=-3e-03x40=4e-03x42=-4e-03x43=-6e-03x44=-4e-03x47=4e-03x48=2e-03x50=2e-03xn679=2e-03x683=2e-03x685=-3e-03x687=2e-03x689=-2e-03x691=2e-03xn" | awk 'BEGIN{L=2400;N=8192;for(n=0;n0?(R>32767?32767:R):(R<-32768?32768:65536+R)));}else if(n<2*L){o[n-L+1]=R;}}}' | awk '{a[i++]=$0}END{while(j

     Listen to the audio output of the above command: 9000-uncompressed.wav

     The audio encoding format is a plain text format that uses various delimiters to separate important information.  For example, here is a sample of encoded audio data that plays one interval with a 220 hz tone, followed by a 440hz tone, followed by a 110hz tone played at the same time as a 220hz tone:

220=1
440=1
220=1x110=1

     In this case, the intervals are separated by a newline, and the individual tones in an interval are separated by an 'x' character.  Here is an example of how this information can be fed into the audio decoder to produce actual sounds:

echo -en "220=1n440=1n220=1x110=1" | awk '{l=split($1,t,"x");for(j=0;j1?1:(R<-1?-1:R));printf("%04Xn", and(0xFFFF, R*0x7FFF));}}' | tac | tac | xxd -r -p | aplay -c 1 -f S16_BE -r 48000

     Listen to the audio output of the above command: sample-tone.wav

     In the above example, a few of the numerical constants in the audio decoder were set manually to specific values.  In practice, they will be different for each audio sample that is produced with the audio codec tool provided here.  These constants are things like the sample rate of the audio, and the number of samples produced per interval.

     In order to repeat this process for a more complex waveform such as human speech, a Fourier transform is performed to determine what frequency components are most important for representing the signal in any given interval.  Once these important frequency components are determined, they are included along with a numerical weighting that describes exactly how important they are.  In the previous trivial example, all the weightings were set to 1.

     Here is a tiny sample of the encoded audio data for a more complex waveform produced with the audio encoding tool.  Note that the decimal weightings are represented in E notation to save space:

0=0
0=0
0=0
147=-2e-03x148=-2e-02x149=-1e-02x150=7e-04
128=-3e-03x129=-1e-02x136=-1e-02x137=-1e-02x138=2e-04
145=-6e-03x146=3e-03x147=9e-03x149=1e-02x150=1e-02x151=1e-03

     E notation is used by default to describe the weightings of frequency components in the audio codec tool.  A detailed analysis of the audio decoder can be found in the following awk script.  This awk script expects E notation frequency weights as shown in the previous example:

awk '
	
	

	
	
	BEGIN{
		
		
		L=2205;
		
		
		
		
		
		
		
		
		
		
		N=8192;
		
		
		
		
		
		for(n=0;n;n+=1)
			o[n+1]=0;
	}
	{
		
		
		
		
		
		
		l=split($1,t,"x");
		
		for(j=0;j;j+=1){
			
			
			

			
			
			
			
			
			
			
			split(t[j+1],p,"=");
			
			
			
			split(p[2],z,";");
			
			x[j+1]=z[1];
			
			y[j+1]=z[2];
			
			f[j+1]=p[1];
		}
		
		
		
		
		
		
		
		
		
		
		
		
		
		M=(NR==64?349:2*L);
		for(n=0;n;n+=1){
			
			
			
			R=0;
			I=0;
			
			for(j=0;j;j+=1){
				
				a=cos(2*3.141592*f[j+1]*(n/N)*1.000000);
				
				b=sin(2*3.141592*f[j+1]*(n/N)*1.000000);
				
				
				R+=(x[j+1]*a)-(y[j+1]*b);
				I+=(x[j+1]*b)+(y[j+1]*a);
			}
			
			if(n=(R*(n/L))+(o[n+1]*((L-n)/L));
				
				
				R*=2.000000;
				
				
				R=int(R*32767);
				
				
				
				
				
				
				
				
				printf("%04Xn",R==0?0:(R>0?(R>32767?32767:R):(R<-32768?32768:65536+R)));
			}else if(n<2*L){
				
				
				
				o[n-L+1]=R;
			}
		}
	}
'

     The terminal environment presents unique challenges when it comes to compressing data:  The resulting compressed data must be easily typed (or copied) into the terminal, and one must also be considerate of various control characters that have special meaning to the shell environment itself.  Certain characters like '*' can interfere with the expression of the command itself when the shell treats them as control information, but the user's intention is to use them as data.  Similar considerations exist for non-printable characters, backticks, single quotes etc.  Binary data such as nulls or extended ASCII characters which may not be portable are also off limits.

     It can be observed that the information we need to compress consists only of the following characters in all cases: {'0','1','2','3','4','5','6','7','8','9','e','-','=', '.', '+','n','x'}.  This means that other printable characters are freely available to express compressed versions of these fundamental characters that represent our data.  Furthermore, it was necessary to consider only a method of decompression that could be done quickly and with very little code.  The decompressor code must be very small, otherwise any benefit that compression would gain would be defeated by the increased code required to represent the decompressor in a single command.  Including the decompressor along with the command is essential since the meme would have little value if it required installing external tools.

     The solution that was applied resembles the use of grammar rules in constructing a programming language.  For example, the character 'A', which does not appear in the output, can be used to represent the character sequence 'e-0' for a savings of 2 characters for every occurrence (not considering the text added to the decoder).  The use of tries was employed to find the most common substrings in the input.  Furthermore, this calculation was repeated for a small number of substring lengths in order to greedily calculate the highest benefit string substitution.  This process was repeated until all available printable characters were associated with a maximum benefit replacement rule.  Once no further substitutions were possible, the compression ends.  This algorithm is by no means the most space efficient, but it was well suited to the task since it reduced the total command size by about 50% in most cases without being too complex or slow in decompressing.

     Here is a detailed explanation of an example of the de-compression algorithm written in awk:

awk 'function d(f,k,a,b,o,w,i,q,h,j){
	
        
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	for(k=32;k<127;k++)
		o[sprintf("%c", k)]=k;
	
	
	
	
	
	
	split("e-0,,, 2x,,#40, 3x,,=-,=2#,(2,+,,-,.,&40,0,1,2,3,4,5,6,7,8,9, 2n,;,=2%,=,*#,0=0,38,=3,*%,(3, 4x,=4,39,=5,?n,21,=2,=1,=7&, 3n,(1,/5,(4,79,%1,A#,(5,(7&,C#,%5,@7,=9&,B5,22,,J:,*:,<5,,%0,@8,26,(6,e,HH,13,42,30,K#,%2,D40,44,,43,@3,)F,3Q,>7,P#,/0,=8&,80,x,41,3>,E#,E&,J&,,",a,",");

	
	
	split("0,1,2,3,4,5,6,7,8,9,=,e,x,+,-,.,n,;",w,",");

	
	
	q=0;
	
	for(g in w)
		
		if(o[w[g]]==f)
			q++;
	
        
        
	
	i=f-32+1;
	
	if(q>0)
		
		
		printf(f==110?"n":a[i]);
	else{
		
		
		h=split(a[i],b,"");
		for(j=0;j;j++)
			d(o[b[j]]);
	}
}
d($1) 
'

     Different Unix systems have different default methods of interacting with audio drivers.  This is why the -k option has been included in the Fast Meme Transform tool to make it easy to create memes that can work of a number of different systems.  By default, the meme transform tool assumes that you have aplay installed, and this is usually is by default on Ubuntu.  For other platforms, you may be able to use the 'sox' program if you specify an audio driver.  If your meme doesn't work at first due to a 'command not found', try again with a different -k option.  As of this writing, valid values are currently 0, 1, 2, 3:

./fmt -f 9000.wav -t 99999999 -k 0
./fmt -f 9000.wav -t 99999999 -k 1
./fmt -f 9000.wav -t 99999999 -k 2
./fmt -f 9000.wav -t 99999999 -k 3

     The output from the above commands will look something like this:

... | xxd -r -p | sox -q -r 48000 -b 16 -B -e signed -c 1 -t raw  - -t `sox -h|grep DRIVERS|sed 's/^.*: ([^ ]+).*$/1/g'`
... | xxd -r -p | aplay -q -c 1 -f S16_BE -r 48000
... | xxd -r -p | sox -q -r 48000 -b 16 -B -e signed -c 1 -t raw  - -t coreaudio
... | xxd -r -p | sox -q -r 48000 -b 16 -B -e signed -c 1 -t raw  - -t alsa

     There are two mathematical observations that can give us a bit of compression for free and help to reduce the output size.  The first reduction reduces the size by half losslessly, and the second reduction cuts the size in half again, but this time the reduction is lossy.  The first observation is related to the fact that our input signal is purely real and this means that the output has symmetries that aren't needed to re-construct the original signal.  More precisely, in the frequency domain X(-w) = conjugate(X(w)) when x(n) is purely real.  The trick is related to the fact that we can simply drop the imaginary part of the Fourier coefficients only mild distortion of the signal.

     These two topics are explored in much greater detail in Fourier Transform Coefficients Of Real Valued Audio Signals.

     The effects of endpoint discontinuities and spectral leakage were also explored.  Mitigation techniques such as use of the Hann Window were explored.  The act of zero padding as is done in the fast Fourier transform has an effect on the spectral leakage which was also explored.

     This analysis ended up being quite long, so it was split out into another article specifically on Endpoint Discontinuities and Spectral Leakage.

     One problem that was encountered initially with re-constructing the audio was the presence of high frequency popping sounds from artifacts that occurred around the endpoints of each sampling window in the re-constructed signal.  It is quite unpleasant to listen to, and you can enable this as a feature with the fast meme transform tool:

./fmt -f my_file.wav -b

     Listen to the audio output of the above command: 9000-popping.wav

     It sounds awful!

     This problem occurs when filtering is applied in the frequency domain on independent sample intervals without any communication between the intervals once they are reconstructed in the time domain.  Since a multiplication in the frequency domain corresponds to convolution in the time domain, it is necessary to communicate some information from the previous sample into the next sample whenever a filter has been applied.  The amount of information that needs to be communicated depends on the length of the filter function that is convolved with the input signal.  In the trivial case, the filter function could be a simple impulse with value 1 and length 1.  In this case, no information would need to be communicated between intervals.  This should be intuitive, since multiplying against an impulse filter function in the frequency domain wouldn't change the frequencies at all, and your resulting filtered time domain signal would simply be the same as taking the Fourier transform, them immediately taking the inverse transform.  In this case, no extra information from other intervals is necessary.

     There are two popular methods to achieve proper filtering in the frequency domain:  Overlap Add and Overlap Save.  The fast meme transform tool uses a variant of the overlap add algorithm.

     The variant of overlap add that was employed in the fast meme transform tool is different from the classical overlap add algorithm in that it only adds in contributions from the previous interval instead of all previous overlapping intervals.  In addition, the classical overlap add algorithm will save (len(h) -1) items to be added into values in the next interval, but the version in the meme transform tool only saves up to L values.  Finally, since the meme transform tool employs lossy compression and non-multiplicative ad-hoc changes to the frequency domain, the use of pure overlap add or overlap save can still produce endpoint discontinuities.  Therefore, a smoothing function is employed:  For each interval of length L in the re-constructed signal, the weighting of the output is linearly shifted from considering the overlap audio from the previous interval, to the audio calculated in the current interval.  In practice, this completely solved the issue with endpoint discontinuities even in the presence of high amounts of distortion due to dropping frequencies with low power contributions.

     Here are some of the command-line options that come with the Fast Meme Transform Tool.  Note that if you run the tool without any -f parameter, it will wait for input to come from standard in.

-k, --command-type

A single number that will determine which type of command will be generated to play the final audio. You may need to use this option to generate a command that will play in your operating system without installing additional tools. 0 = use sox to automatically detect audio driver (Tested to work on Ubuntu 16), 1 = use aplay (tested on Ubuntu 16) 2 = sox with coreaudio (Probably works on a Mac, but I don't own one so I can't test this), 3 = sox with alsa.

-h, --help

Display this help menu.

-s, --start-time

The start time in seconds where the output audio clip will begin.

-e, --end-time

The end time in seconds where the output audio clip will end. An end time of zero indicates end of the clip.

-v, --verbose

Be verbose.

-f, --file

The file name of the input PCM .wav file to be processed.

-u, --disable-compression

Disable compression of the output Fourier coefficients.

-w, --enable-hann-window

Enable the application of a Hann window to each audio sample period.

-i, --include-imaginary

Include the imaginary parts of the coefficients of the Fourier transform output.

-a, --full-power-spectrum

Include the entire range of frequencies from the Fourier transform output instead of just including up to N/2.

-m, --volume

A floating point number describing the output volume.

-b, --enable-endpoint-discontinuities

Enable endpoint discontinuities that occur at the ends of sampling intervals (see -p flag). Enable this option if you want the presence of annoying high-frequency popping noises in your audio.

-t, --threshold

A floating point number that describes the cutoff power level, below which any frequency with less than this quantity of power will not be included in the output. If you make this number high, the output will be small, but the audio will sound terrible. If you make this value low, the output will be huge and slow to decode, but the result will sound better. If you make this value large, the output will be small but the sound quality will decrease.

-p, --sample-time-period

A floating point number that describes how much long each sub-sample will be when the audio is broken up into small sections to have a Fourier transform applied to it.

-n, --float-precision

An integer describing how many digits should be printed after the decimal after for each Fourier coefficient that appears in the output.

-l, --filters

A list of length 3*n that one or more sequence of a single character followed by two numbers that describe the parameters of either a low pass or high pass biquad filter. Each biquad filter will be applied one after another on the waveform before the Fourier transform is calculated. For example, using '-l l 5 500' will apply a lowpass filter with a Q value of 5, and a cutoff frequency of 500. The extra character 'l' or 'h' is for low pass and high pass respectively.

-c, --pitch

A positive number that can be used to either shift the frequency up or down. Set this value greater than one to increase the pitch, and set it between 0 and 1 to decrease the pitch.

     In this article, it has been shown that you can make use of the Fourier transform to build self-contained Linux commands that play arbitrary audio including human speech.  A tool called the Fast Meme Transform that includes many options for meme configuration was presented.  A number of different techniques were employed to make the listening and meme distribution experience as good as possible, such as the use of the overlap add algorithm, and ASCII text based compression.  Many of the details are not presented in this article, but have instead been moved into articles of their own:

Leave a Reply

Your email address will not be published. Required fields are marked *