Contents

Docker Automated Build by Example - OpenGrok

Why am I so obsessed with this OpenGrok, probably because it is the only decent code auditing tool on the market?

When I saw this project about half a year ago, I wrote a build script casually (in fact, the shell has disgusted me for a long time). At that time, it was activated on docker, and I just left it there and let it go. It didn’t come in handy for various reasons. There are a lot of things reversed recently, on the student computer that has been idle for many years docker run Get up one, open the page and give me a 404?? 😅I’m not very happy about this. There is no way to revisit the Dockerfile I wrote in the past, and the ideas in the past have successfully realized the functions that I wanted to achieve in the past, including but not limited to: one-click deployment, automatic index update, real-time acquisition of the latest release, combined with CI to fix the historical version, etc. wait.

So just fill in the pits that I fell into when I wrote the Dockerfile.

Environment variable ARG vs ENV

According to the official document 1 and SOF’s answer 2, one of ARG and ENV is a compilation environment variable, and the other is a runtime environment variable. The word “variable” here is particularly confusing. In fact, these two are not the same thing at all!

The variable defined by ARG is more like an alias for parameter expansion in the shell, that is, it will only put the value of this ARG in the Dockerfile when it is read.* Dockerfile's command line * Replace it with the corresponding value before running, and other programs invoked by commands such as RUN and ENTRYPOINT cannot see the existence of this ARG.

ENV is just the opposite. Variables defined using ENV can be obtained by other programs in the container, which is true* environment variable *. In addition, the variables defined by ENV are also used in the program lines called by RUN and ENTRYPOINT, because the shell will also parse the form in the process of parameter expansion, such as ${PARAM} parameters. Let me give you an example.

ARG THEARG=argg
ENV THEENV=envv
RUN echo $ARG $THEENV

This section is equivalent to running in image

THEENV=envv sh -c "echo argg $THEENV"

How to pass the environment variables of the host machine to docker? We need to combine them 2. For example, we need to let the script built by docker know whether it is in Travis or not.

ARG CI=false
ENV CI=$CI
RUN /your/path/to/install

like this in .travis.yml run in docker build --build-arg CI=$CI, the install script will be able to read the CI variable to know if it is in the CI process.

Layered FS

First put a picture of the official document.

{% imagehttps://docs.docker.com/engine/userguide/storagedriver/images/container-layers.jpg Layered %}

Docker’s file system is a layered architecture. Every sentence in the Dockerfile will add a layer to the image, even if ENTRYPOINT and EXPOSE are set. There is an association between layers, which is also the root of docker’s build cache.

There is a huge problem with this hierarchical structure. Since each layer is related to each other, the content of the lower layer will not and cannot be changed after being saved. Any changes will be carried out on a new layer3. Similarly, for files fixed in the lower layer, the upper layer also** cannot be deleted ** . Therefore, when creating an image, you must clean up the generated temporary files at each layer, such as the package manager cache, downloaded compressed packages, released temporary files, etc., otherwise the image size will quickly expand.

Reasonable utilization of Build Cache

Since the file system is hierarchical, docker also thinks, in docker build As long as the operation of each step is the same in the process, the result must also be the same. We can think that Docker saves a file at every step of running the build (laughs), so that if one step fails, you can go back to the file and try again. Therefore, between different versions, the steps with different results should be placed later, so that there will be as many identical layers as possible, and it will save time and space when other people pull. For example这个build There is only one step difference between the versions, and the first 10 layers all hit the cache, so the total build time of the 24 versions has also been shortened from the previous 11 minutes to 6 minutes. When users pull, they will only pull the last layer with differences.

Hosts

The github releases hadn’t been walled back then…the release had a fixed CNAME back then…there was a 16-core 32G buildbot back then…but the most bitter thing inside the wall was the external network. But even when building, hosts cannot be set, because it is not a file 4 in the container at all. According to what is said on the Internet, docker uses a principle similar to the magic mount in Magisk to dynamically load hosts and resolv.conf. The contents of these files are in docker run when specified. But what should I do when I build?

Then you can only write hosts in each step of the script~

Anyway, in the end, I didn’t change the hosts and gave up.

Fucking shell language

Believe me this is definitely the most powerful but most garbage glue language in the universe!

It doesn’t do me any good to master the tricks of the shell!

Once you’ve written the code, you’ll never want to read it a second time!

No one else will ever understand!

Even yourself!

It just works!

In fact, this has always been what I want to say. I use the shell language once a day to make me sick. Attached are two Rexiang that I just wrote today.

Xiang 01

[[ "$URL" =~ .zip ]] && mv $TARBALL $TARBALL.zip && unzip -p $TARBALL.zip > $TARBALL
tar xzf $TARBALL -C / || { echo "Download failed! exiting.."; exit 1; }

The pits contained in these two sentences:

  • =~ The string to the right of the operator is not surrounded by double quotes, it will be considered a regular expression, otherwise all characters will be automatically escaped.
  • $TARBALL.zip in the string TARBALL There are a hundred ways to replace failure
  • if it exists $TAR Variable, if you want to output the value of the TAR variable plus the BALL.zip string, you need to write it as ${TAR}BALL.zip. That is, without curly braces, the shell will only recognize the longest variable name it can find and replace it with its value, even if it is a null value.
  • if $TARBALL Variables containing spaces or special characters such as carriage returns will be faithfully replaced into the command and cause syntax errors. for example mv poxn hub.avi poxn hub.avi.zip The semantics here have changed. Change to move the three files poxn, hub.avi, and poxn to hub.avi.zip. Target is not a folder, bound to error. The solution is to wrap quotes around the variable. In this way, spaces in it will also be escaped.
  • unzip -p Will* document content * output to stdout! And I redirect it to a file with a pipe character. If the number of files in the compressed package is greater than 1, all the files in it will be spliced into one file, which is not what we want most of the time.
  • Stupid bash! Stupid bash (zsh does not have this problem) has special requirements for compound expressions enclosed in braces
    • There must be a space on the right side of the left brace, otherwise an error will be reported
    • The last statement must end with a semicolon
    • you can use it (echo "fuck" && echo "you") (regardless of spaces or semicolons) to replace { echo "fuck" && echo "you";} Because the former is a logical expression. And the entire bash documentation doesn’t say what the definition of parentheses is in the language.

Xiang 02

URLS_PATTERN='https?:(?:(?!p5p|pkg|src).)*(.tar.gz|\.zip|.tar.gz.zip)'

This is a regular expression that recognizes URLs ending in tar.gz or zip or both, and excludes URLs that contain p5p, pkg, src in their filenames.

  • handwriting *. It’s okay if it works. Anyway, two months have passed and I can’t understand it anymore. 🙄
FILE=/tmp/releases.txt
curl 'https://api.github.com/repos/OpenGrok/OpenGrok/releases' -o $FILE
tags=($(grep 'tag_name' $FILE | cut -f4 -d\"))
urls=($(grep -Po $URLS_PATTERN $FILE))

These two lines query the Github API and save all the tag names in the tag array, and stuff all URLs that match the previous rule into the urls array.

See where the array declaration and tag identification are? 🤓

  • Enclosing an unquoted string in parentheses splits it by spaces into an array of strings (more precisely, by $IFS cutting)
  • because $IFS Contains spaces and carriage returns, so if any element of tag and url contains a space, the space will be used as a token cutter to cut the url in half. Solution: assign value before these two sentences IFS=$'\n'.
    • Oh, by the way, if you want to include a real newline character in the string, such as the IFS assignment above, you have to enclose it in single quotes (you can’t use double quotes or no quotes), and add a $ symbol. Indispensable! As for why? This is a feature! A feature that zsh does not have!
  • If you need to traverse a character array, you need to use for tag in "${tags[@]}", and the strangest thing is that the quotes here do not cause the strings to be concatenated but instead* directly ignore *. The following is a demo where set -x Indicates the statement actually executed by the shell after parameter transformation, starting with a plus sign.
bash-3.2$ s=(one two)
bash-3.2$ set -x
bash-3.2$ echo ${s[@]}
+ echo one two
one two
bash-3.2$ echo "${s[@]}"
+ echo one two
one two
--> 双引号没了
bash-3.2$ echo \"${s[@]}\"
+ echo '"one' 'two"'
"one two"
--> 多了两个奇怪的单引号
bash-3.2$ echo "aa${s[@]}aa"
+ echo aaone twoaa
aaone twoaa
-> 双引号被忽略,aa在展开后数组的边上

bash-3.2$ for u in ${s[@]}; do echo $u; done
+ for u in '${s[@]}'
+ echo one
one
+ for u in '${s[@]}'
+ echo two
two
bash-3.2$ for u in "${s[@]}"; do echo $u; done
+ for u in '"${s[@]}"'
+ echo one
one
+ for u in '"${s[@]}"'
+ echo two
two
--> 又来了俩单引号,且双引号毫无卵用

bash-3.2$ for u in one two; do echo $u; done
+ for u in one two
+ echo one
one
+ for u in one two
+ echo two
two
bash-3.2$ for u in "one two"; do echo $u; done
+ for u in '"one two"'
+ echo one two
one two
--> 若不使用数组,双引号转义效果正常

Why should I step on this pit? What is the difference between this cheating language feature and the four ways of writing “Hui”? I can still write an article on things like named pipes, parameter substitution, background tasks and coroutines. But what’s the point? Not at all!

no more.

After looking at this operation again, has the size changed?

Yes, it’s 250 megabytes larger… (Escape

One operation is as fierce as a tiger, and the build is calculated as six and five.

# before
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
opengrok            latest              dd08abaa35aa        5 months ago        446.4 MB
# after
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
ttimasdf/opengrok   latest              19a31e242f56        17 minutes ago      695.1 MB

Finally, just pick up the shell and do it, a shuttle!

docker pull ttimasdf/opengrok
docker run -d -v ~/src:/src -p 8080:8080 --name grok opengrok

References