In the post we’ll show how to leverage auth ptoxy (with login/pass) for JAVA application.
Category: Development
In this post I’ll share how I’ve added a LetsEncrypt SSL certificate to a subdomain at VPS with Centos 7 using Vesta CP.
This post is devoted to the steps of how to create subdomain (Centos 7 and Vesta CP) and map a [Laravel] project folder to it.
- Remove previous git origin
git remote remove origin
- Add new origin with PAT (<Token>) :
git remote add origin https://<TOKEN>@github.com/<USERNAME>/<REPO>.git
- Push once with –set-upstream
git push --set-upstream origin main
Now you might commit changes to the remote repo without adding PAT into a push command every time.
If you need to create PAT, use the following tut.
Suppose there is a table like below (1 info row only):
Blows Minute (BPM) |
Speed (RPM) | Power, PSI | Flow, PSI | Tool Sys |
---|---|---|---|---|
0-2500 | 0-250 | 1.8 HP | 2.6-13.2 GPM | SDS Max |
How to scrape it using cheerio.js as a parser?
Case 1 (1 row only)
node.exe index.js > scrape.log 2>&1
When executing file index.js we redirect all the console.log() output from console into a file scrape.log .
Remove empty html tags recursively
Sometimes we have the code with html tags that contain nothing but whitespace characters. Often those tags are nested. See a code below:
<div>
<div>
<div></div>
</div>
</div>
What regex might be used to find and remove those tags?
Obvious solution is <div>\s*?<\/div>
.
\s
stands for “whitespace character”. It includes [ \t\n\x0B\f\r]
. That is: \s
matches a space(
) or a tab (\t
) or a line(\n
) break or a vertical tab (\x0B
) sometimes referred as (\v
) or a form feed (\f
) or a carriage return (\r
) .
General case
In general case, we use the following regex:<(?<tag>[a-z]+?)( [^>]+?|)>\s*?<\/(\k<tag>)>
where <tag>
is a named match group: [a-z]+?
JAVA code
When applying it recursively we might use the following code, JAVA:
public static String removeEmptyTags(String html) { boolean compareFound = true; Pattern pattern = Pattern.compile("<(?<tag>[a-z]+?)( [^>]+?|)>\\s*?</(\\k<tag>)>", Pattern.MULTILINE | Pattern.DOTALL); while (compareFound) { compareFound = false; Matcher matcher = pattern.matcher(html); if(matcher.find()) { compareFound = true; html = matcher.replaceAll(""); } } return html; }
How to handle cookie, user-agent, headers when scraping with JAVA? We’ll use for this a static class ScrapeHelper
that easily handles all of this. The class uses Jsoup library methods to fetch from data from server and parse html into DOM document.
Suppose we’ve a following array:
arr = [[ 5.60241616e+02, 1.01946349e+03, 8.61527813e+01],
[ 4.10969632e+02 , 9.77019409e+02 , -5.34489688e+01],
[ 6.10031512e+02, 9.10689615e+01, 1.45066095e+02 ]]
How to print it with rounded elements using map() and lamba() functions?
l = list(map(lambda i: list(map(lambda j: round(j, 2), i)), arr))
print(l)
The result will be the following:
[[560.24, 1019.46, 86.15],
[410.97, 977.02, -53.45],
[610.03, 91.07, 145.07]]

Sequentum Enterprise is a powerful, multi-featured enterprise data pipeline platform and web data extraction solution. Sequentum’s CEO Sarah Mckenna doesn’t like to call it web scraping because, in its description, the web scraping refers to many different types of unmanaged and non-compliant techniques for obtaining web-based datasets.