Prikazani su postovi s oznakom Facebook. Prikaži sve postove
Prikazani su postovi s oznakom Facebook. Prikaži sve postove

subota, 19. veljače 2022.

Recommending items to more than a billion people (by Aleksandar Ilic and Maja Kabiljo POSTED ON JUNE 2, 2015 TO Core Data, ML Applications)

 https://engineering.fb.com/2015/06/02/core-data/recommending-items-to-more-than-a-billion-people/

 

The growth of data on the web has made it harder to employ many machine learning algorithms on the full data sets. For personalization problems in particular, where data sampling is often not an option, innovating on distributed algorithm design is necessary to allow us to scale to these constantly growing data sets.

Collaborative filtering (CF) is one of the important areas where this applies. CF is a recommender systems technique that helps people discover items that are most relevant to them. At Facebook, this might include pages, groups, events, games, and more. CF is based on the idea that the best recommendations come from people who have similar tastes. In other words, it uses historical item ratings of like-minded people to predict how someone would rate an item.

CF and Facebook scale

Facebook’s average data set for CF has 100 billion ratings, more than a billion users, and millions of items. In comparison, the well-known Netflix Prize recommender competition featured a large-scale industrial data set with 100 million ratings, 480,000 users, and 17,770 movies (items). There has been more development in the field since then, but still, the largest numbers we’ve read about are at least two orders of magnitude smaller than what we’re dealing with.

A challenge we faced is to design a distributed algorithm that is going to scale to these massive data sets, and how to overcome issues that arose because of certain properties of our data (like skewed item degree distribution, or implicit engagement signals instead of ratings).

As we’ll discuss below, approaches used in existing solutions would not efficiently handle our data sizes. Simply put, we needed a new solution. We’ve written before about Apache Giraph, a powerful platform for distributed iterative and graph processing, and the work we put into making it scale to our needs. We’ve also written about one of the applications we developed on top of it about graph partitioning. Giraph works extremely well on massive data sets, it is easily extensible, and we have a lot of experience in developing highly performant applications on top of it. Therefore, Giraph was our obvious choice for this problem.

Matrix factorization

A common approach to CF is through matrix factorization, in which we look at the problem as having a set of users and a set of items, and a very sparse matrix that represents known user-to-item ratings. We want to predict missing values in this matrix. In order to do this, we represent each user and each item as a vector of latent features, such that dot products of these vectors closely match known user-to-item ratings. The expectation is that unknown user-to-item ratings can be approximated by dot products of corresponding feature vectors, as well. The simplest form of objective function, which we want to minimize, is:

Here, r are known user-to-item ratings, and x and y are the user and item feature vectors that we are trying to find. As there are many free parameters, we need the regularization part to prevent overfitting and numerical problems, with gamma being the regularization factor.

It is not currently feasible to find the optimal solution of the above formula in a reasonable time, but there are iterative approaches that start from random feature vectors and gradually improve the solution. After some number of iterations, changes in feature vectors become very small, and convergence is reached. There are two commonly used iterative approaches.

Stochastic gradient descent optimization

Stochastic gradient descent (SGD) optimization was successfully practiced in many other problems. The algorithm loops through all ratings in the training data in a random order, and for each known rating r, it makes a prediction r* (based on the dot product of vectors x and y) and computes prediction error e. Then we modify x and y by moving them in the opposite direction of the gradient, yielding certain update formulas for each of the features of x and y.

Alternating least square

Alternating least square (ALS) is another method used with nonlinear regression models, when there are two dependent variables (in our case, vectors x and y). The algorithm fixes one of the parameters (user vectors x), while optimally solving for the other (item vectors y) by minimizing the quadratic form. The algorithm alternates between fixing user vectors and updating item vectors, and fixing item vectors and updating user vectors, until the convergence criteria are satisfied.

Standard approach and problems

In order to efficiently solve the above formula in a distributed way, we first looked at how systems that are similar in design to Giraph do it (using message passing instead of map/reduce). The standard approach corresponds to having both users and items as vertices of a graph, with edges representing known ratings. An iteration of SGD/ALS would then send user and/or item feature vectors across all the edges of the graph and do local updates.

There are a few problems with this solution:

  1. Huge amount of network traffic: This is the main bottleneck of all distributed matrix factorization algorithms. Since we send a feature vector across each edge of the graph, the amount of data sent over the wire in one iteration is proportional to #Ratings * #Features (here and later in the text we use # as notation for ‘number of’). For 100 billion ratings and 100 double features, this results in 80 TB of network traffic per iteration. Here we assumed that users and items are distributed randomly, and we are ignoring the fact that some of the ratings can live on the same worker (on average, this should be multiplied by the factor 1 – (1 / #Workers)). Note that smart partitioning can’t reduce network traffic by a lot because of the items that have large degrees, and that would not solve our problem.
  2. Some items in our data sets are very popular, so item degree distribution is highly skewed: This can cause memory problems — every item is receiving degree * #Features amount of data. For example, if an item has 100 million known ratings and 100 double features are used, this item alone would receive 80 GB of data. Large-degree items also cause processing bottlenecks (as every vertex is atomically processed), and everyone will wait for a few largest-degree items to be finished.
  3. This does not implement SGD exactly in the original formulation: Every vertex is working with feature vectors that it received in the beginning of the iteration, instead of the latest version of them. For example, say item A has ratings for users B and C. In a sequential solution, we’d update A and B first, getting A’ and B’, and then update A’ and C. With this solution, both B and C will be updated with A, the feature vector for the item from the beginning of the iteration. (This is practiced with some lock-free parallel execution algorithms and can slow down the convergence.)

Our solution — rotational hybrid approach

The main problem is sending all updates within each iteration, so we needed a new technique of combining these updates and sending less data. First we tried to leverage aggregators and use them to distribute item data, but none of the formulas we tried for combining partial updates on the item feature vectors worked well.

We finally came up with an approach that required us to extend Giraph framework with worker-to-worker messaging. Users are still presented as the vertices of the graph, but items are partitioned in #Workers disjoint parts, with each of these parts stored in global data of one of the workers. We put all workers in a circle, and rotate the items in clockwise direction after each superstep, by sending worker-to-worker messages containing items from each worker to the next worker in the line.

This way, in each superstep, we process part of the worker’s user ratings for the items that are currently on the worker, and therefore process all ratings after #Workers supersteps. Let’s analyze the issues the previous solutions had:

  1. Amount of network traffic: For SGD, the amount of data sent over the wire in one iteration is proportional to #Items * #Features * #Workers, and it doesn’t depend on the number of known ratings anymore. For 10 million items, 100 double features, and 50 workers, this brings a total of 400 GB, which is 20x smaller than in the standard approach. Therefore, for #Workers <= #Ratings / #Items rotational approach performs much better, i.e., if the number of workers is less than the average item degree. In all data sets that we are using, the items with small degree are ignored from consideration, as those do not represent good recommendations and can be just noise, so average item degree is large. We’ll talk below more about ALS.
  2. Skewed item degrees: This is no longer a problem — user vertices are the only ones doing processing, and items never hold information about their user ratings.
  3. Computation of SGD: This is equal as in a sequential solution, because there is only one version of a feature vector at any point of time, instead of having copies of them sent to many workers and doing updates based on that.

The computation with ALS is trickier than with SGD, because in order to update a user/item, we need all its item/user feature vectors. The way updates in ALS actually go is that we are solving a matrix equation of type A * X = B, where A is #Features x #Features matrix and B is 1 x #Features vector, and A and B are calculated based on user/item feature vectors forming all known ratings for item/user. So when updating items, instead of rotating just their feature vectors, we can rotate A and B, update them during each of #Workers supersteps and calculate new feature vectors in the end. This increases the amount of network traffic to #Items * #Features2 * #Workers. Depending on proportions between all the data dimensions, for some items this is better than the standard approach, and for some it isn’t.

This is why a blend of our rotational approach and the standard approach gives the superior solution. By looking at item with some degree, in the standard approach the amount of network traffic associated with it is degree * #Features, and with our rotational approach, it’s #Workers * #Features2. We’ll still update items in which degree < #Workers * #Features using the standard approach, and we’ll use our rotational approach for all larger-degree items, and therefore significantly improve performance. For example, for 100 double features and 50 workers, the item degree limit for choosing an approach is around 5,000.

To solve the matrix equation A * X = B we need to find the inverse A-1, for which we use open source library JBLAS, which had the most efficient implementation for the matrix inverse.

As SGD and ALS share the same optimization formula, it is also possible to combine these algorithms. ALS is computationally more complex than SGD, and we included an option to do a combination of some number of iterations of SGD, followed by a single iteration of ALS. For some data sets, this was shown to help in the offline metrics (e.g., root mean squared error or mean average rank).

We were experiencing numerical issues with large-degree items. There are several ways of bypassing this problem (ignoring these items or sample them), but we were using regularization based on the item and user degrees. That keeps the values for user and item vectors in a certain numerical range.

Evaluation data and parameters

In order to measure the quality of recommendations, before running an actual A/B test, we can use a sample of the existing data to compute some offline metrics about how different our estimations are from the actual user preferences. Both of the above algorithms have a lot of hyperparameters to tune via cross-validation in order to get the best recommendations, and we provide other options like adding user and item biases.

The input ratings can be split in two data sets (train and test) explicitly. This can be very useful in cases in which testing data is composed of all user actions in the time interval after all training instances. Otherwise, to construct the test data, we randomly selected T=1 items per user, and keep them apart from training.

During the algorithm, for a certain percent of users we rank all unrated items (i.e., items that are not in the training set) and observe where training and testing items are in the ranked list of recommendations. Then we can evaluate the following metrics: mean average rank (the position in the ranked list, averaged over all test items), precision at positions 1/10/100, mean of the average precision across all test items (MAP), etc. Additionally we compute root mean squared error (RMSE), which amplifies the contributions of the absolute errors between the predictions and the true values. To help monitor convergence and quality of results, after each iteration we are printing all these metrics.

On a sample data set with 35 billion weighted training ratings and 0.2 billion testing ratings, the following figure shows how RMSE reduces on training and testing sets for #Features=8 or #Features=128, where other parameters are fixed.

Item recommendation computation

In order to get the actual recommendations for all users, we need to find items with highest predicted ratings for each user. When dealing with the huge data sets, checking the dot product for each (user, item) pair becomes unfeasible, even if we distribute the problem to more workers. We needed a faster way to find the top K recommendations for each user, or a good approximation of it.

One possible solution is to use a ball tree data structure to hold our item vectors. A ball tree is a binary tree where leafs contain some subset of item vectors, and each inner node defines a ball that surrounds all vectors within its subtree. Using formulas for the upper bound on the dot product for the query vector and any vector within the ball, we can do greedy tree traversal, going first to the more promising branch, and prune subtrees that can’t contain the solution better than what we have already found. This approach showed to be 10-100x faster than looking into each pair, making search for recommendations on our data sets finish in reasonable time. We also added an option to allow for specified error when looking for top recommendations to speed up calculations even more.

Another way the problem can be approximately solved is by clustering items based on the item feature vectors — which reduces the problem to finding top cluster recommendations and then extracting the actual items based on these top clusters. This approach speeds up the computation, while slightly degrading the quality of recommendations based on the experimental results. On the other hand, the items in a cluster are similar, and we can get a diverse set of recommendations by taking a limited number of the items from each cluster. Note that we also have k-means clustering implementation on top of Giraph, and incorporating this step in the calculation was very easy.

Comparison with MLlib

Spark MLlib is a very popular machine-learning library that contains one of the leading open source implementations in this domain. In July 2014, the Databricks team published performance numbers of their ALS implementation on Spark. Experiments were conducted on scaled copies of the Amazon reviews data set, which originally contained 35 million ratings and ran for five iterations.

In the following graph, we compared our rotational hybrid approach (which we implemented in Giraph) with the standard approach (implemented in Spark MLlib, including some additional optimizations, like sending a feature vector at most once to a machine), on the same data set. Due to hardware differences (we had about twice the processing power per machine), in order to make a fair comparison we were looking at total CPU minutes. Rotational hybrid solution was about 10x faster.

Additionally, the largest data set on which experiments were conducted with standard approach had 3.5 billion ratings. With rotational hybrid approach, we can easily handle more than 100 billion ratings. Note that quality of results is the same for both, and all performance and scalability gains come from different data layout and decreased network traffic.

Facebook use cases and implicit feedback

We used this algorithm for multiple applications at Facebook, e.g. for recommending pages you might like or groups you should join. As already mentioned, our data sets are composed of more than 1 billion users and usually tens of millions of items. There are actually many more pages or groups, but we limit ourselves to items that pass a certain quality threshold — where the simplest version is to have the item degree greater than 100. (Fun side note: On the other side, we have some very large-degree pages — the “Facebook for Every Phone” page is actually liked by almost half of Facebook’s current monthly active users.)

Our first iterations included page likes/group joins as positive signals. The negative signals on Facebook are not as common (negative signals include unliking a page or leaving a group after some time). Also this may not actually mean that a user has negative feedback for that item; instead, he or she might have lost interest in the topic or in receiving updates. In order to get good recommendations, there is a significant need for adding negative items from the unrated pairs in the collection. Previous approaches include randomly picking negative training sample from unrated items (leading to a biased, non-optimal solution) or treating all unknown ratings as negative, which tremendously increases complexity of the algorithm. Here, we implemented adding random negative ratings by taking into account the user and item degrees (adding negative ratings proportional to the user degree based on the item degree distribution), and weighing negative ratings less than positive ones, as we failed to learn a good model with uniform random sampling approach.

On the other hand, we have implicit feedback from users (whether the user is actively viewing the page, liking, or commenting on the posts in the group). We also implemented a well-known ALS-based algorithm for implicit feedback data sets. Instead of trying to model the matrix of ratings directly, this approach treats the data as a combination of binary preferences and confidence values. The ratings are then related to the level of confidence in observed user preferences, rather than explicit ratings given to items.

After running the matrix factorization algorithm, we have another Giraph job of actually computing top recommendations for all users.

The following code just shows how easy it is to use our framework, tune parameters, and plug in different data sets:

CFTrain(
    ratings=CFRatings(table='cf_ratings'),
    feature_vectors=CFVectors(table='cf_feature_vectors'),
    features_size=128,
    iterations=100,
    regularization_factor=0.02,
    num_workers=5,
)
CFRecommend(
    ratings=CFRatings(table='cf_ratings'),
    feature_vectors=CFVectors(table='cf_feature_vectors'),
    recommendations=CFRecommendations(table='cf_recommendations'),
    num_recommendations=50,
    num_workers=10,
)

Furthermore, one can simply implement other objective functions (such as rank optimizations or neighboring models) by extending SGD or ALS computation.

Scalable CF

Recommendation systems are emerging as important tools for predicting user preferences. Our framework for matrix factorization and computing top user recommendations is able to efficiently handle Facebook’s massive data sets with 100 billion ratings. It is easy to use and extend with other approaches.

We are thinking about many improvements and algorithms, including:

  • Incorporating the social graph and user connections for providing a better set of recommendations
  • Starting from the previous models instead of random initialization, for recurrent learning
  • Automatic parameter fitting with cross-validation for optimizing the different metrics for a given data set
  • Trying out better partitioning and skipping machines that don’t need certain item data during rotations

We are actively working on recommendations and many other applications on top of Giraph, so stay tuned for more exciting features and development in this field.

Thanks to Dionysios Logothetis, Avery Ching, Sambavi Muthukrishnan and Sergey Edunov from the Giraph team who made this work possible and helped write this story, and Liang Xiong and Bradley Green for early experimentation and all feedback and insights.

 


Evaluating boosted decision trees for billions of users (by Aleksandar Ilic and Oleksandr Kuvshynov, POSTED ON MARCH 27, 2017 TO ML Applications)

 https://engineering.fb.com/2017/03/27/ml-applications/evaluating-boosted-decision-trees-for-billions-of-users/

 

Facebook uses machine learning and ranking models to deliver the best experiences across many different parts of the app, such as which notifications to send, which stories you see in News Feed, or which recommendations you get for Pages you might want to follow. To surface the most relevant content, it’s important to have high-quality machine learning models. We look at a number of real-time signals to determine optimal ranking; for example, in the notifications filtering use case, we look at whether someone has already clicked on similar notifications or how many likes the story corresponding to a notification has gotten. Because we perform this every time a new notification is generated, we want to return the decision for sending notifications as quickly as possible.

More complex models can help improve the precision of our predictions and show more relevant content, but the trade-off is that they require more CPU cycles and can take longer to return results. Given these constraints, we can’t always evaluate all possible candidates. However, by improving the efficiency of the model, we can evaluate more inventory in the same time frame and with the same computing resources.

In this post, we compare different implementations of a type of predictive model called a gradient-boosted decision tree (GBDT) and describe multiple improvements in C++ that resulted in more efficient evaluations.

Decision tree models

Decision trees are commonly used as a predictive model, mapping observations about an item to conclusions about the item’s target value. It is one of the most common predictive modeling approaches used in machine learning, data analysis, and statistics due to its non-linearity and fast evaluation. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.

Decision trees are very powerful, but a small change in the training data can produce a big change in the tree. This is remedied by the use of a technique called gradient boosting. Namely, the training examples that were misclassified have their weights boosted, and a new tree is formed. This procedure is then repeated consecutively for the new trees. The final score is taken as a weighted sum of the scores of the individual leaves from all trees.

Models are normally updated infrequently, and training complex models can take hours. At Facebook’s scale, however, we want to update the models more often and run them on the order of milliseconds. Many backend services at Facebook are written in C++, so we leveraged some of the properties of this language and made improvements that resulted in a more efficient model that takes fewer CPU cycles to evaluate.

In the next figure we have a simple decision tree with the following features:

  • the number of clicks on notifications from person A today (feature F[0])
  • the number of likes on the story corresponding to the notification (feature F[1])
  • the total number of notification clicks from person A (feature F[2])

At different nodes, we check the values of the above features and traverse the tree to get the probability of clicking on a notification.

Flat tree implementation

The naive implementation of the decision tree model is a simple binary tree with pointers. However, this is not very efficient due to the fact that the nodes do not need to be stored consecutively in the memory.

On the other hand, decision trees are usually full binary trees (a binary tree in which each node has exactly zero or two children) and can be stored compactly using vectors. No space is required for the pointer; instead, the parent and children of each node can be found by arithmetic on array indices. We will use this implementation for comparison in the experimentation section.

Compiled tree implementation

Each binary tree can be represented as a complex ternary expression, which can be compiled and linked to a dynamic library (DLL) that can be directly used in the service. Note that we can add or update the decision tree model in real time without restarting the service.

We can also take advantage of LIKELY/UNLIKELY annotations in C++. They are directions for the compiler to emit instructions that will cause branch prediction to favor the “likely” side of a jump instruction. If the prediction is correct, it means that the jump instruction will take zero CPU cycles. We can compute branch predictions based on the real samples from the ranking in batches or from the offline analysis, as the distributions from training and evaluation sets should not change much.

The following code represents an implementation of the above simiple decision tree:

int tree(double* F) {
  return LIKELY(F[0] < 1)
    ? (UNLIKELY(F[1] > 10) ? (F[2] > 50 ? 0.1 : 0.05) : 0.01)
    : (F[2] > 100 ? 0.5 : 0.2);
}

Model range evaluation

In a typical ranking setup, we need to evaluate the same trained model on multiple instances of feature vectors. For example, we need to rank ~1,000 different potential candidates for a given person, and pick only the most relevant ones.

One approach is to iterate through all candidates and rank them one by one. Typically, the model and all candidates cannot fit together into the CPU instruction cache. Motivated by the boosted training, we can actually split the model into ranges of trees (the first N trees, then the next N trees, and so on), so that each range will be small enough to fit the cache memory. Then we can flip the evaluation order: Instead of evaluating all trees on a sample, we will evaluate all samples on each range. This way we can fit the whole tree set in the CPU cache together with all feature vectors, and in the next iteration just replace the tree set. This reduces the number of cache misses and uses block reads/writes instead of RAM.

Furthermore, we often have multiple models that we need to evaluate on the same feature vectors; for example, the probability of the user clicking, liking, or commenting on the notification story. This helps keep all feature vectors in the CPU cache and evaluating models one by one.

Another trade-off that we can make is to rank all candidates for the first N trees and then, due to the nature of boosted algorithms, discard the lowest-ranked candidates. This can improve the latency, but it comes with a slight drop in accuracy.

Common features

Sometimes features are common across all feature vectors. In the above example for a certain person, we need to rank all candidate notifications. The feature vectors [F[0], F[1], F[2]] described above can be something like this:

[0, 2, 100]
[0, 0, 100]
[0, 1, 100]
[0, 1, 100]

It’s easily noticeable that the features F[0] and F[2] are the same for candidates. We can significantly reduce the decision tree size by just focusing on the value F[1], and thus improve the evaluation time.

Categorical features

Most ML algorithms use binary trees, and this can be extended to k-ary trees. On the other hand, some of the features are not actually comparable and are called categorical features. For example, if country is a feature, one cannot say whether the value is less than or equal to “US” (or a corresponding enum value). If the tree is deep enough, this comparison can be achieved using multiple levels, but here we implemented the possibility for checking whether the current feature belongs to a set of values.

The learning method is not changed much — we still try to find the best subset to split on, and the evaluation is very fast. Based on the size of the set, we use the in_set C++ implementation or just concatenate if conditions. This reduces the model size and helps in convergence as well.

Computational results

We trained a boosted decision tree model for predicting the probability of clicking a notification using 256 trees, where each of the trees contains 32 leaves. Next, we compared the CPU usage for feature vector evaluations, where each batch was ranking 1,000 candidates on average. The batch size value N was tuned to be optimal based on the machine L1/L2 cache sizes. We saw the following performance improvements over the flat tree implementation:

  • Plain compiled model: 2x improvement.
  • Compiled model with annotations: 2.5x improvement.
  • Compiled model with annotations, ranges and common/categorical features: 5x improvement.

The performance improvements were similar for different algorithm parameters (128 or 512 trees, 16 or 64 leaves).

Conclusion

These improvements in CPU usage and resulting speedups allowed us to increase the number of examples we rank or increase the model size using the same amount of resources. This approach has been applied to several ranking models at Facebook, including notifications filtering, feed ranking, and suggestions for people and Pages to follow. Our machine learning platforms are constantly evolving; more precise models combined with more efficient model evaluations will allow us to continually improve our ranking systems to serve the best personalized experiences for billions of people.


 

LinkedIn profile of Aleksandar Ilic, Principal Software Engineer at Meta

 https://www.linkedin.com/in/aleksandar-ilic-a1650215



Aleksandar Ilic

Principal Software Engineer at Meta

Mountain View, California, United States500+ connections

 About

- Privacy lead for cross account data sharing
- Leading Identity Platform team of 50+ people for the Facebook Family of Apps (Facebook, Instagram, WhatsApp, Oculus) with product applications in Growth, Ads, Analytics, Integrity.
- Built notifications system team at Facebook for targeting and optimizing notifications sending via push, SMS and email
- Senior researcher in the spectral and chemical graph theory, combinatorics, algorithms and social networks (over 100 publications https://scholar.google.com/citations?user=qnL0jnwAAAAJ&hl=en)

 Experience

  • Meta Graphic

    Principal Software Engineer

    Meta

    - Present4 months

  • Facebook Graphic

    Principal Software Engineer

    Facebook

    - 10 years 2 months

    Palo Alto, CA

    - Company privacy lead for cross app data sharing
    - Leading Identity Platform team of 50+ people for the Facebook Family of Apps with applications on Growth, Ads, Accounting, Integrity.
    - Previously team lead (20+ people) on Notifications system used by all teams at Facebook for targeting and optimizing notifications sending via push, SMS, email
    - Previously part of Facebook growth team and responsible for adding 20+ million of incremental MAP and DAP users
    - Designed multiple large…


  • Wowd Graphic

    Software developer

    Wowd

    - 3 years 2 months

    Team leader for classification and indexing.

    Working on: hot list aggregation and distribution, web pages category classification, topics summary generation, user clustering.

  • Assistant Professor

    Faculty of Sciences and Mathematics, University of Nis

    - 3 years 1 month

    Courses: Parallel and concurent programming, Introduction to informatics, Design and analysis of
    algorithms, Introduction to graph theory and combinatorics, Discrete structures

  • Accordia Group LLC Graphic

    Software engineer

    Accordia Group LLC

    - 1 year 1 month

    Working on Information Extraction System and Semantic Search

Education

  • Faculty of Sciences and Mathematics

    PhD in Computer ScienceComputer Science

    -

  • Faculty of Sciences and Mathematics

    Bachelor of MathematicsMathematics and Informatics

    -

Publications

Evaluating boosted decision trees for billions of users

Facebook Engineering

nedjelja, 9. siječnja 2022.

Use of Facebook for peer to peer communication, still available on Facebook under hashtag #OpBlackOct. The pages were deleted, but the most of material remained.

 

Use of Facebook for peer to peer communication, still available on Facebook under hashtag #OpBlackOct. The pages were deleted, but the most of material remained.
18 September 2016 ·
+ Target IP: 98.129.63.26
+ Target Hostname: clinton.com
+ Target Port: 80
slowhttptest -c 1000 -B -g -o my_body_st



ats -l 9999 -r 200 -s 8192 -w 512 -y 1024 -n 5 -z 32 -k 3 -t FAKEVERB -u http://www.clinton.com -x 10 -p 6
#TangoDown Clinton Equity Strategies/Financial Group #offline Fuck the Clintons! #OpIcarus Phase 4 #OpBlackOct #Killary
Nr. 29 on the list. http://www.clinton.com
Target List: 100 Hedge Funds - http://www.barrons.com/arti…/best-100-hedge-funds-1466223924
[+] Emails found:
------------------
bill@clinton.com
clinton@clinton.com
anusha.bhatnagar@clinton.com
hillary@clinton.com
quantjobs@clinton.com
huma@clinton.com
pantsuit16@clinton.com
mlewinsky@clinton.com
Hrd17@clinton.com
confirm@clinton.com
compliance@clinton.com
HRCxxx@clinton.com
tom@clinton.com
crews@clinton.com
itamar.rosenn@clinton.com
HRC@clinton.com
admin@clinton.com
das@clinton.com
vvtv@clinton.com
bill@clinton.com
francis.ruchalski@clinton.com
pjo@clinton.com
Ankersen@clinton.com
cheater@clinton.com
nader.behbehani@clinton.com
stevel@clinton.com
tary@clinton.com
Hillary@clinton.com
hilldawg@clinton.com
VVTV@clinton.com
humapuma@clinton.com
madam_im_rodham@clinton.com
dontbefooledagain@clinton.com
postmaster@clinton.com

Brunch with Sifted: psychedelic investor Christian Angermayer (By Michael Stothard 16 November 2020)

 https://sifted.eu/articles/christian-angermayer-compass-atai/?fbclid=IwAR01wM8wnNdzymcdMgFejl4sYM42j5jYwlfd95JfwCDU-e7xY3nLE_bZnHI

 

Brunch with Sifted: psychedelic investor Christian Angermayer

Instrumental in the founding of Compass Pathways and chairman of Atai Life Sciences, Christian Angermayer has been leading a global psychedelic renaissance. 

 The art on the 29th floor London penthouse apartment of German tech investor and entrepreneur Christian Angermayer has two themes: psychedelics and gay emperors.

One corner has a 2,300-year-old marble bust of Alexander the Great’s lover Hephaestion. To his right sits a 2,000-year-old bust of the Roman emperor Hadrian next to a picture of his lover Antinous.

But the 42-year-old is most proud of his 2,400-year-old bust of Demeter, the ancient Greek goddess revered in an ancient ritual which, according to Angermayer and a growing body of research, involved dosing figures such as Plato and Socrates with powerful psychedelics.

  “There is more and more evidence that psychedelics were a foundation of western civilisation,” he says.

This excitement makes sense for a man who, over the past five years, has been perhaps the single biggest financial driving force behind shifting psychedelic drugs from a dusty relic of 1960s’ counterculture into the medical mainstream as a legal medicine to treat mental illnesses.

While today there is a growing body of clinical trials and early approvals from regulators, before Angermayer this field was seen as largely the preserve of hippies and dreamers.

In September, the company he helped found in 2017, Compass Pathways, which has developed a synthetic version of psilocybin (the active ingredient in magic mushrooms) and is conducting the world’s first large-scale therapy clinical trial using the drug, listed in New York. His 28% stake, which cost him around $55m over multiple rounds, became worth more than $400m.

Next spring the biotech company he founded Atai Life Sciences — which is pursuing a large range of treatments for mental-health disorders using other more obscure drugs such as DMT, ar-ketamine and ibogaine — is set to go public as well at an expected $1bn-$2bn valuation, according to bankers familiar with the process.

With these high-profile exits and an increasingly large fortune — his investment company, Apeiron Investment Group, manages around $750m of his own money and $600m of others’ and invests in longevity research, biotech, fintech, spacetech, crypto and entertainment  — Angermayer is becoming one of Europe’s most powerful and influential tech investors.

He has also won the ear of political leaders — his sideboard has pictures of him with German chancellor Angela Merkel and Austrian chancellor Sebastian Kurz and European Central Bank head Christine Lagarde — and now regularly co-invests with renowned US investors such as Peter Thiel, cofounder of PayPal and an early investor in Facebook, Steve Jurvetson, an early investor and board member of Tesla and SpaceX and Michael Novogratz from Galaxy Investment Partners.

 “Christian is successful because he is curious,“ Thiel told Sifted in an email exchange. “He finds because he seeks — both ideas and people. That sounds simple, but it’s rare.”

 

The first trip

The rule of Brunch with Sifted is that the interviewee picks the restaurant and Sifted picks up the bill. But as London is in the midst of a pandemic, restaurants feel a bit risky, so I turn up at his apartment cum office near Old Street in east London where he has prepared a business brunch of smoked salmon, ham, croissant, coffee, cheese, melon and strawberries. 

His cousin, Stephan — not the only handsome young family member hanging around his apartment on a Saturday — is making some scrambled eggs with just the right hint of pepper. 

Angermayer is wearing a grey jumper with the chemical symbol for psilocybin on the chest (he has the same symbol tattooed on his arm). He starts with a tour of the art, pointing out the psychedelic pieces, the gay emperors and their lovers as well as a golden triptych made by his friend Landon Ross which — he says with a flourish —  “contains only elements created in the last three seconds of a dying star… that’s basically where we all come from and end up.”

When we sit down it becomes apparent that this luxurious spread isn’t his usual Saturday morning fare: he chooses delicately to avoid any carbs — which is part of a regime to help extend his life — while I pile bread and ham onto my plate.

I start by asking him about Compass Pathways and how he got involved with psychedelics. Or more importantly: when was his first trip?

 

It started, he says, with friends on the Caribbean island of Canouan (where, he notes, it’s legal) back in 2014. He had resisted pressure for years: he had never drunk, smoked or done any drugs before so he was reluctant to try psilocybin. But when he finally tried he found it such a profound experience he became a “die-hard convert” to its positive benefits to mental health. 

“All my trips together now make some of the most meaningful experiences of my entire life,” he says.

He began searching for a way to turn this into a business, when his friend Michael Novogratz in 2017 introduced him to George Goldsmith and Ekaterina Malievskaia (who now lead Compass Pathways) and who had similar ideas about the benefits of psilocybin for mental health. 

A few weeks later Angermayer had promised the couple the £3m of investment they needed to start a real business. “They had spent years in meetings getting nowhere, but I stopped them after 10 minutes and said, ‘I am in’.” (The £3m was split between Novogratz, Thiel and him.)

Just a few years later Compass Pathways is doing psilocybin therapy clinical trials in 20 sites across nine countries in Europe and North America and is now getting so big in this field that it’s faced criticism for trying to create a magic mushroom monopoly by patenting an artificial form of psilocybin, a critique Angermayer rejects. 

“We are not coming after the hippies growing this organically at home,” he says. “We are patenting this so we can bring the synthetic version of this natural drug to the hundreds of millions of people who need it for medical conditions and can only get it if it’s prescribed by doctors and paid by their health insurance.”

 

A Chaga mushroom coffee and the first few startups

Angermayer at this point wants a coffee so walks over to the kitchen. “I am having a mushroom coffee, would you like one?” I looked startled, but only for a moment as he assured me it’s just coffee with Chaga mushroom (a popular “wellness” supplement) with some extra Lion’s Mane mushroom extract pipetted in. All non-psychedelic. 

Now sipping our pleasant and slightly earthy coffee, he tells me that he didn’t grow up in a world of drug research and billionaires. He was born in a small town in northern Bavaria to a mother who is a secretary and father who was an engineer. “If you had asked my parents what I should be, they would have said teacher or doctor.” 

But by the age of 20 he was dropping out of university to found his first biotech company, Ribopharma, with two of his professors which eventually merged with Alnylam Pharmaceuticals and listed on the stock exchange (now worth $15bn). Ever since then he has been starting tech companies and investing, initially with four friends and now alone through Alpiron.

One of his big breaks came during the European sovereign debt crisis where, thanks to some political connections, he became known among elite investor circles worldwide as someone who understood the problem and — more importantly — how Germany saw it. This made him friends and eventually coinvestors.

Living forever

Today, on top of psychedelic research, Angermayer is one of the most prominent investors in the field of longevity, where he is heavily involved in two companies, Rejuveron Life Sciences in Switzerland and Cambrian Biopharma in the US.

The science behind longevity — pioneered by figures such as Aubrey de Grey and Ray Kurzweil — is today centred around finding specific treatments around the nine “hallmarks of ageing” to help to halt or even reverse the ageing process. 

Angermayer is also part of a movement pushing for ageing to be classified as a disease so it would be easier to do clinical trials on anti-ageing drugs. 

“People say that dying is natural, but it’s really things going wrong in your body,” he says. “For thousands of years, we have merely devoted ourselves to the conviction that ageing and death are inevitable and ‘natural’, and have integrated this narrative into our culture and all religions. Science is just beginning to better understand the reasons for ageing and dying, and a disease that we understand will eventually be cured.”

But do we want to live forever? “I don’t think people will want to live forever, but I think they would like it to be their choice to die after say 150 or 200 years rather than getting killed by the ageing disease.” He adds: “Some people will find it hard to cope with an ever quicker changing world, but psychedelics could help with that as well. They could help us keep the lust for life alive.”

 

The future

Angermayer has a lot on beyond psychedelics and living forever. One of his investments at the moment, AbCellera, an antibody therapy company he is closely involved with which has been part of the fight to develop a treatment for coronavirus, has hired bankers and is set to IPO within the next months for several billion dollars, according to people close to the company.

He is an investor in Deposit Solutions, the German fintech unicorn, and two European space leaders Mynaric and Isar Aerospace, to name a few others. He says he has raised $500m for his portfolio companies in the past six months and is hands-on operationally with most of them. He connects people: it was him who last year put Japan’s SoftBank in touch with now-collapsed German fintech Wirecard about an investment. He has made films such as Filth, Aspern Papers and Hector and the Search for Happiness.

Oh, and he is on the advisory council of Rwandan president Paul Kagame. This came after, in 2009, he bought a bank in Rwanda after being impressed by meeting the leader at dinner in Germany. It was “the only thing that was being privatised at that moment”, he says. He expanded the lender and later sold it to Bob Diamond, the former Barclays CEO.

But what does the future hold, I ask? By this point we’ve been talking for three hours, the afternoon is marching on, and Angermayer is even breaking his no-carbs rule nibbling on the corner of a croissant.

Pausing for a moment, leaning back in his chair, he says that beyond tech investments and dealmaking he has a long-term vision that somewhen down the road, psychedelics could shape global politics for the better. 

He is, for instance, funding a study by Robin Carhart-Harris, head of psychedelic research at Imperial College in London, to study the effect of Palestinians and Israelis drinking the psychedelic drug Ayahuasca together.

 He is also spreading the message more widely in his powerful circles. “I am a partner of the Munich Security Conference, [a high profile event in international relations], and at one dinner, I joked with them that I was going to spike their drink with psilocybin…”

He adds quickly: “I mean, I was completely joking, of course. But I also told them that there had been times in history when world leaders used the power of psychedelics to become better leaders, especially for conflict resolution purposes. Looking at a world which, despite best intentions, becomes increasingly divided, maybe we have to think radically differently.”

He also has a European agenda, believing that his investment focus on biotech and deeptech are areas where Europe, after losing out to the US over the past decade in creating the big ecommerce and social platforms, has a chance of catching up. “We have some of the best engineers and researchers in the world,” he says.

Governments he is less enamoured of, criticising the German government’s latest change in the law to make it easier for startups to offer stock options as not going far enough, saying that after years of efforts from business it was only a tiny change to the rules. “It’s like having sex once after 10 years, and not even good sex, and then expecting your partner to be satisfied… No, it is still far too little, and far from perfect. We have to do better.”

And on that note, it’s time to leave, saying a socially-distanced goodbye, walking into his private lobby (where there is an 18th-century bronze statue of the Roman gladiator called Retiarius), taking the lift down and getting on my bike back home.

 



 

 

 

 

The Facebook Papers consortium is growing, and reporters are gaining access to more documents (Analysis by Brian Stelter, CNN Business Updated 1452 GMT (2252 HKT) October 26, 2021)

 https://edition.cnn.com/2021/10/26/media/reliable-sources-facebook-papers-consortium/index.html?fbclid=IwAR2UBc1zOa0cGL3WMIyNg8dbHgmrSQJ5WM3WkRNg28HuHLmNP67xrsFBfKc

 

New York (CNN Business)A version of this article first appeared in the "Reliable Sources" newsletter. You can sign up for free right here.

The Facebook Papers consortium is growing.
Last week the number of American news outlets with access to internal Facebook documents supplied to the SEC by Frances Haugen stood at 17. Those outlets -- from CNN to Politico, Washington Post to WIRED -- agreed to a Monday morning embargo, which is why more than 50 stories all came out on the same day.
There are many more stories in the works -- and there are more newsrooms joining the consortium. Platformer's Casey Newton wrote Monday night that "a host of new publications joined the consortium today, ensuring another volley of coverage designed to squeeze more juice from the rind."
One of the new participants, Shoshana Wodinsky of Gizmodo, posted a tweet alluding to her "suddenly" becoming part of the group. I also hear that The Guardian, which was missing last week, is now on board, along with CNBC and The New York Post. As Ben Smith of The New York Times reported, the competing-yet-coordinating newsrooms are keeping in touch via Slack.
Another one of the members, The Associated Press, has a handy explainer of the arrangement here. "Each member of the consortium pursued its own independent reporting on the document contents and their significance," the wire said. "Every member also had the opportunity to attend group briefings to gain information and context about the documents."

The documents will keep flowing for weeks

The "Facebook Papers" are not just a one- or two-day story. Reporters and editors are expecting to receive additional documents for at least the next couple of weeks.
Remember, this is all based on what Haugen submitted to the SEC. Redacted versions of the documents are being shared with members of Congress and members of the news media on an ongoing basis. "That process continues as Haugen's legal team goes through the process of redacting the SEC filings," for instance by removing names of Facebook users, The AP explained.
Newton said in his Monday night newsletter that "I continue to receive new documents every week day," adding, "The documents arrive with no particular eye toward organization or theme." He said "it's extraordinary to be able to read these documents and learn more about the company," but also acknowledged that the drip-drip-drip serves Haugen's interests.
Whether there will be sustained public interest in the drip-drip-drip is an open question. During a live audio chat with Newton and other reporters on Twitter Monday night, veteran tech journalist Steven Levy predicted that assignment editors and readers would tire of the stories before long.
Laura McGann, a former editor at Vox and Politico, commented in a tweet that "the Facebook revelations are a good example of how exposes will always be more enticing to media and audience than the same story that was observably true all along: We saw with our own eyes that Facebook pushes content that riles people up."

Eight recommended reads

Katie Harbath is keeping a Google Doc with a list of every "Facebook Papers" story. Protocol has a list going, too. At the risk of leaving out lots of impressive journalism, I want to highlight a handful of stories that are worth spending time with:
-- "How Facebook Fails 90 Percent of its Users:" That's the provocative title of an article by The Atlantic's projects editor Ellen Cushing. "These documents show that the Facebook we have in the United States is actually the platform at its best," she writes. "In the most vulnerable parts of the world — places with limited internet access, where smaller user numbers mean bad actors have undue influence — the trade-offs and mistakes that Facebook makes can have deadly consequences."
-- Here's an example via CNN's Eliza Mackintosh: "Facebook knew it was being used to incite violence in Ethiopia. It did little to stop the spread, documents show."
-- "Facebook has known it has a human trafficking problem for years," and still hasn't fully fixed it, CNN's Clare Duffy writes. Duffy used search terms listed in FB's internal research to find active Instagram accounts purporting to offer domestic workers for sale, and the company removed the accounts and posts due to her inquiry.
-- In the internal documents supplied by Haugen, showing conversations among rank and file FB employees, "a common theme is anger." John Hendel's piece for Politico highlights some of the most striking comments.
-- Steven Levy's WIRED headline hitting on a similar theme: "Facebook failed the people who tried to improve it."
-- One of USA Today's headlines: FB "says it's winning the fight against hate speech targeting Black Americans. Its own research says otherwise."
-- Get acquainted with the "Single User Multiple Accounts" problem: "How Facebook users wield multiple accounts to spread toxic politics."
-- Bloomberg's story about a business quandary that Mark Zuckerberg is determined to address: "Facebook, alarmed by teen usage drop, left investors in the dark."

What we've lost

On Tuesday the New York Times Book Review editor Pamela Paul is coming out with a rather timely book titled "100 Things We've Lost to the Internet." It's a thoughtful tour of the pre-Internet age, and it inspired me to ask for her read of this present moment.
"I think the Facebook revelations are really just confirmations of what many people have long suspected or been through personally. It's just that now we know that the effects of social media that we've perceived or felt on a gut (read: sucker punch) level were well known and even deliberate on the part of the company," Paul wrote to me.
"Take, for example, the known effects of 'likes' or the absence thereof, on teens' self-esteem," she wrote. "When you think back to the Before Times, teenagers *always* worried about what their peers thought. The problem now is that they *know* — and their worst fears are often confirmed in easily quantifiable and public ways. In the book, I write about how some of those formerly common human experiences and emotions — uninhibitedness, secrets, unpopular opinions, ignoring people, private observances— are now gone, or severely compromised. Think about how Instagram alone changes teenage life. What would high school have been like for Romy and Michelle if they'd been on Instagram?"